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ABSTRACT 

y 

A  class  of  representations  for  the  least  squares  estimator  is  presented 
and  their  applications  sketched.  Partly  motivated  by  one  such  representation, 
-we-  proposers  class  of  weighted  jackknife  estimators  of  variance  of  the  least 
squares  estimator  by  deleting  any  fixed  number  of  observations  at  a  time. 

These  estimators  are  unbiased  for  homoscedastic  errors  and  a  special  case,  the 
delete-one  jackknife  variance  estimator,  is  almost  unbiased  for  hetero- 
scedastic  errors.  The  method  is  extended  in  various  ways,  including  the  use 
of  the  jackknife  histogram,  for  variance  and  interval  estimation  with 
nonlinear  parameters.  Three  bootstrap  methods  are  considered.  It  is  shown 
that  none  of  them  has  the  robustness  property  enjoyed  by  the  (weighted) 
delete-one  jackknife.  Subset  sampling  with  variable  subset  size  is  also 
considered.  Several  bias-reducing  estimators  are  proposed.  They  are 
motivated  by  the  observation  that  bias-reduction  is  mathematically  equivalent 
to  unbiased  estimation  of  variance.  Some  simulation  results  on  estimating  the 
ratio  of  two  normal  parameters  are  reported. 
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SIGNIFICANCE  AND  EXPLANATION 


The  Quenouille-Tukey  jackknife  ia  an  old  tool  for  bias  reduction  and  non- 
parametric  variance  estimation.  Recently  Efron  introduced  the  bootstrap 
method  as  a  more  versatile  tool.  It  seems  to  have  the  potential  to  be  useful 
in  many  kinds  of  problems  involving  estimation  of  error.  These  tools  are  not 
quite  well  developed  for  regression  models.  We  propose  a  class  of  weighted 
jackknife  methods  that  recompute  the  least  squares  estimates  by  deleting  any 
fixed  number  of  observations  at  a  time.  The  key  step  is  to  weight  each  subset 
least  squares  estimate  with  the  determinant  of  the  Fisher  information  matrix 
of  the  subset.  Some  desirable  properties  of  the  procedures  are  proved.  For 
nonlinear  parameters,  the  methods  are  useful  for  bias  reduction  and  variance 
estimation.  Since  we  do  not  restrict  to  the  classical  delete-one  jackknife, 
confidence  Intervals  can  be  constructed  from  the  histogram  of  some  proper 
estimates  from  the  resamples.  The  utility  of  the  classical  jackknife  method 
is  broadened  with  this  new  tool.  On  the  other  hand  we  show  that  the  existing 
bootstrap  methods  may  not  work  so  well  in  the  regression  situation.  Some 
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JACKKNIFE  AND  BOOTSTRAP  INFERENCE  IN  REGRESSION  AND  A 
CLASS  OP  REPRESENTATIONS  FOR  THE  LSE 

C.  F.  Jeff  Wu 

1.  Introduction 

In  the  first  part  of  this  paper  we  show  that  the  full-data  least  squares  estimate  8 
(LSE)  can  be  represented  as  a  weighted  average  of  the  LSE's  8fl  from  all  subsets  s  of  a 
fixed  sise  with  the  weight  proportional  to  the  determinant  of  the  *gxa  matrix  associated 
with  the  subset  (Theorem  2),  i.e., 

(1.1)  g  -  sum  of  ws6s  over  all  subsets  of  a  fixed  sise,  wg  «  lx8xBl'  ).  WB  “  1 

Instead  of  averaging  over  all  subsets  of  a  fixed  size,  we  may  consider  drawing  samples  from 

the  full  data  according  to  a  resampling  scheme,  and  computing  the  LSE  and  the  determinant 

of  the  corresponding  XI  matrix  from  each  such  sample.  One  main  result  (Theorem  1) 

states  that  the  above  representation  still  holds  for  any  resampling  method  that  is 

symmetric  and  nondegenerate  with  positive  probability  (Assumption  B  of  Section  3), 

including  the  jackknife  and  the  bootstrap.  Several  implications  of  the  representation 

result  are  sketched  in  Section  3.  A  major  one  is  in  suggesting  a  new  class  of  robust 

regression  estimators.  The  details  are  in  Section  4. 

The  representation  (1.1)  involves  a  linear  function  of  8  .  To  estimate  the  variance- 

8 

covariance  matrix  (henceforth  abbreviated  as  variance)  of  3,  it  seems  natural  to  look  at 

A 

a  quadratic  function  of  3^ .  A  quadratic  extension  of  (1.1)  is 

(1.2)  £  I  wb(8b  -  8>(8a  -  3)T,  w#  in  (1.1) 

where  the  summation  is  over  all  subsets  of  sise  r.  It  turnB  out  that  the  choice  £  = 
(r-k+1)/(n-r),  n  ■»  #  of  observations,  k  ■  #  of  regression  parameters,  makes  (1.2)  an 
unbiased  estimator  of  the  variance  of  3  if  the  errors  are  uncorrelated  with  mean  zero  and 
constant  variance  (Theorem  3).  This  estimator  is  denoted  as  v.  _  in  (5.1). 
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The  second  and  major  part  of  this  paper  deals  with  the  jackknife  and  bootstrap 
resampling  methods  for  variance  and  interval  estimation  and  bias  reduction .  The  method 

(1.2)  can  be  viewed  as  a  weighted  jackknife  by  deleting  every  subset  of  size  n-r  from  the 
full-data.  The  purpose  of  the  adjustment  factor  £  »  (r-k+1)/(n-r)  in  (1.2)  is  to  make 
the  distance  of  /£(fls  -  P)  match  the  distance  of  B-B*  For  example,  if  r  -  n-1 
(delete-one-at-a-time) ,  B  is  too  close  to  B.  It  is  necessary  to  multiply  the  weighted 
sum  of  squares  in  (1.2)  by  a  large  factor  £  »  n-k.  Further  attention  is  paid  to  the  two 
extreme  choices  of  the  subset  size  •  r.  If  r  =  k  «  •  of  regression  parameters,  it  turns 
out  that  Vj  k  is  identical  to  the  usual  variance  estimator  (by  assuming  equal 
variances).  Theorem  4  provides  the  details,  including  the  necessary  modification  of  the 
definition  (1.2)  when  some  subsets  are  associated  with  singular  X  matrices.  The  other 
extreme  is  the  delete-one  jackknife,  r  »  n-1.  Our  proposal  is  closely  related  to  a 
delete-one  jackknife  proposed  by  Binkley  (1977).  The  main  difference  is  that  Hinkley's 
estimator  uses  weights  proportional  to  the  Square  of  |x8Xg|  and  is  therefore  a  biased 
estimator  of  the  variance  of  B.  Both  delete-one  jackknife  variance  estimators  are  robust 
against  error  variance  heterogeneity  in  that  their  biases  converge  to  zero  as  n  +  «•  under 
(the  same)  weak  regularity  conditions.  Hinkley's  estimator  does  not  fare  well  in  the 
empirical  study  reported  in  Section  10. 

In  practice  the  resampling  methods  of  inference  are  only  used  in  situations  where  no 
closed  form  of  the  variance  (or  other  measures  of  variability)  of  the  point  estimator  is 
available.  In  Section  7  we  consider  extensions  of  the  above  method  to  parameters 
8  »  g(B)  which  are  nonlinear  functions  of  the  regression  parameters  B*  An  obvious  exten¬ 
sion  is  to  replace  B,  Bg  in  (1.2)  by  their  counterparts  g(6),  g(Bg) •  The  scale  factor 
£  is  applied  after  the  nonlinear  transformation  g,  (7.1).  Another  approach  is  to 
incorporate  this  scale  adjustment  Internally  before  applying  the  transformation  g, 

(7.2) .  To  obtain  confidence  intervals  for  8  without  computing  variance  estimates,  we 
propose  a  jackknife  percentile  method  through  the  construction  of  a  weighted  empirical 
distribution  function  of  some  estimates  of  8  based  on  the  same  subsets  with  the  same 
weight  w#  in  (1.1).  Here  we  find  it  more  natural  to  estimate  6  with  the  internal 
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adjustment  method.  The  jackknife  percentile  method  la  similar  in  spirit  to  Efron's  (1982) 
bootstrap  percentile  method.  There  is  soon  theoretical  advantage  in  using  the  percentile 

A 

method  since  the  possible  skewness  in  the  original  point  estimate  6  will  be  reflected  in 
the  histogram  of  the  resampled  estimates.  Extensions  to  nonlinear  regression  models  are 
briefly  outlined.  The  jackknife  methodology  has  long  been  associated  with  the  delete-one 
jackknife.  It  has  mainly  been  used  for  variance  estimation  and  bias  reduction.  The  method 
proposed  here  overcomes  these  limitations  by  allowing  the  deletion  of  more  than  one 
observation  and  the  construction  of  the  Jackknife  histogram.  Further  discussion  is  given 
in  Section  7. 

Other  resampling  methods  are  studied  in  Section  8.  The  subset  sampling  method  is  an 
extension  of  the  jackknife  by  allowing  different  subset  sixes.  The  variance  estimator 
(1.2)  is  extended  to  this  situation.  Three  bootstrap  methods  of  variance  estimation  are 
considered.  Two  of  them  do  not  in  general  give  unbiased  variance  estimators  in  the  equal 
variance  case,  as  is  shown  by  a  counterexample.  The  third  one  by  bootstrapping  the 
residuals  is  known  to  be  identical  to  the  usual  variance  estimator  (8.19)  in  the  case  of 
linear  parameters.  The  latter  estimator  is  unbiased  in  the  equal  variance  case  but  is 
biased  for  unequal  variances. 

The  issue  of  bias  reduction  is  studied  in  Section  9.  It  is  shown  that  bias  reduction 

A 

is  achievable  if  and  only  if  the  variance  of  8  can  be  estimated  unbiasedly  (apart  from  a 
lower  order  term).  Based  on  this  connection,  several  estimators  of  the  bias  of  8  are 
proposed  as  natural  counterparts  of  the  variance  estimators  considered  before.  Conditions 
under  which  these  estimators  achieve  bias  reduction  are  given  in  Theorems  7,  8  and 
Corollaries  3,  4, 

Several  jackknife  and  bootstrap  methods  are  compared  in  a  simulation  study,  assuming  a 
quadratic  regression  model.  Criteria  for  the  simulation  comparison  include  the  bias  of 
estimating  the  variance-covariance  matrix  of  8,  the  bias  of  estimating  the  nonlinear 
parameter  8  -  -Sj/^flj),  the  coverage  probability  and  length  of  the  interval  estimators 
of  8.  For  the  last  two  criteria,  Fleller's  method  and  the  t-interval  with  the 
linearisation  variance  estimator  are  included  for  comparison.  The  simulation  results  are 
summarized  at  the  end  of  Section  10.  Further  questions  are  raised  in  Section  11. 
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2.  Some  matrix  lemmas 

For  a  matrix  X  of  order  n  *  k,  let  X8  be  Its  r  x  k  submatrix  consisting  of 
the  i^ii-iilj*1  rows,  s  "  (l^,...,ir),  and  be  the  n  *  (k-1)  submatrix  obtained 

from  deleting  the  jth  column  of  X.  For  a  square  matrix  A  of  order  k,  its  adjoint  is 
defined  as  the  k  x  k  matrix 

(2.1)  adj  A  -  tc^],  1  <  i,  j  <  k 

with  its  (i,j)  element  ci;)  -  (-1)i+^M3i  and  is  the  determinant  of  the 

(k-1)  x  (k-1)  submatrix  of  A  with  the  jth  row  and  ith  column  deleted.  Let  | A | ,  A-1, 

A  be  respectively  the  determinant,  inverse  and  transpose  of  A.  Recall  A-1  -  adj  a/)a|, 
if  A  1  exists. 


Immma  1.  Let  X  and  Z  be  n  x  k  matrices,  n  >  k.  Then 

T 


<i) 


(ii) 


where 

(2.2) 


|X  Z|  -  l  |*  |  |*  |  , 

seSk 

l*Tz|  “  ("Ik)’’  I  Ixjz.l  for  any  r  >  k 

s  esr 


Sr  -  all  subsets  s  of  size  r  . 


(Note  that  Xfl  and  Z^  are  square  matrices  for  s  e  S^) . 

Proof:  Lemma  1(i)  is  in  Noble  (1969,  p.  226).  Lemma  1(11)  is  obtained  by  applying  Lemma 
1 ( i )  to  each  term  | X8Zg |  and  to  j  XTZ j . 

□ 

_2.  Let  X  be  an  n  x  k  matrix,  n  >  k.  Then 


(2.3) 


adj  XTX  -  ("I*:})'1  l  «aj  X*X  ,  r  >  k  . 

r  *  1  S«s  •  ■ 

r 


T  n 

If  W  are  nonsingular  for  all  s  6  Sr, 


(2.4) 


i*T*i(*T*)-’  -  is  i*.*.i(x.v  • 


Proof:  Frosi  definition,  the  (i,j)  elements  of  adj  XTX  and  adj  x£xs  are 
(-1)i+i|x(l,TX<i,|  and  (-1)i+i|x<l,TX<i>|  respectively,  where  X^1*  is  obtained  from 
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delating  the  ith  column  of  Xg.  Therefore  (2.3)  le  equivalent  to 

|X<1)TX(1)|  .  )  |X‘S>Vi,|  , 

.as 

r 

which  follows  from  Lemma  1(11),  noting  that  X(i)TX(1)  and  X^,Tx£i)  are  both  of  order 
k-1.  (2.4)  follows  from  (2.3)  since  ad}  X  »  |a|a 

O 

L— e  3.  tat  Z  be  an  r  »  k  matrix,  r  >  k.  If  |zTz|  -  0,  then  |zTw|  -  0  for  any 
r  *  k  matrix  W. 

Proofs  Since  |zTz|  -  0,  Z  is  not  of  full  rank,  which  implies  ZTVi  is  singular  and 
|ZTW|  -  0.  *  n 


3.  Representations  of  the  least  squares  estimator 

To  motivate  the  general  representation  result,  let  us  first  consider  the  simple  linear 


regression  model 


yi  -  a  +  Bxj^  +  et,  i  -  1, 


with  Ee^  -  0,  Ee2  -  o2  and  cov(e1,eJ)  -  0  for  i  *  }•  The  ordinary  least  squares 
* 

estimator  (LSE)  B  of  B  has  several  equivalent  expressions, 

«  n  _  n  _  , 

B  -  i  (y.  -  y)(x  -  x)/2  (x  -  x) 

1  i  l  ,  x 

(3.2)  -  l  (y.  -  yJ(x,  -  x.,>/  I  <x,  -  x^>2 

i<j  1  3  1  3  i<j  3 

(3*3)  ‘  1  Uij  ®i}  ' 


l  .  IkZll 

ij  xt  - 


are  the  pairwise  slopes  for  x^  ?  Xj  and 

(x1  -  x , ) 2 

u.  -  - 5  • 

3  ).  «xi  -  v 

Kj  3 

To  validate  the  step  from  (3.2)  to  (3.3),  u^  B^  in  (3.3)  is  defined  to  be  rero  for 
xA  -  Xj.  One  can  now  interpret  8  as  a  weighted  average  of  all  the  least  squares 

A 

estimates  8^  based  on  the  (i,j)  pairs  of  observations,  with  the  weight  proportional 
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where 


2  T 

to  (x^  -  Xj )  ,  which  happens  to  be  the  determinant  of  xi j  < 


is  the  design  matrix  corresponding  to  the  (i ,  j )  observations.  It  seems  natural  to  guess 
the  following  extension  for  the  general  linear  model:  the  LSE  based  on  the  full  data  set 

is  equal  to  a  weighted  average  of  the  LSE's  based  on  all  subsets  of  fixed  size  with  the 

T 

weight  proportional  to  the  determinant  of  the  x  X  matrix  corresponding  to  the  subset.  In 

fact  a  more  general  result  will  be  shown  to  be  true. 

Throughout  the  paper  we  assume  the  following  general  linear  model: 

T 

-  xt$  +  ei  ,  1  -  1,...,n 

where  x^  is  a  k  x  1  deterministic  vector,  (5  is  the  k  x  1  vector  of  parameters  and 

2  t 

e^  are  uncorrelated  errors  with  mean  zero  and  variance  o^.  Writing  y  «  (y 1 » • ■ • ,yn)» 

T  T 

e  -  (e1,...,en)  and  X  -  [x.,...,xn]  ,  (3.4)  can  be  rewritten  as 

(3.5)  y  -  X0  +  e  ,  Var(e)  -  E  -  diag(o^, . . .  ,02) 

l  n 

T 

We  always  assume  XX  is  nonsingular.  The  ordinary  least  squares  estimator  (LSE)  based  on 
the  full  data  (y,x)  is 

*  T  -1  T 

(3.6)  6  -  (XX)  X  y  . 

In  Theorem  1  6  is  related  to  the  LSE's  based  on  values  "resampled"  from  the  full  data 

(y,X).  A  brief  discussion  of  resampling  procedures  is  given  next. 

The  full  data,  z^  -  (y 1,x1 zn  -  (yn'xn>  ar®  thought  of  as  being  observed  and 
fixed.  A  resample  of  (z^)®  is  a  reweighted  version  of  (z^)"  with  weight  P^  >  0.  The 

vector  P*  ■  (P*,...,P*)  is  called  a  resampling  vector.  For  each  P  ,  the  corresponding 

*  * 

least  squares  estimate  8  is  baaed  on  P^  "copies"  of  z^,  i.e., 

(3.7)  8*  -  (XTD*X)’1XTD*y,  D*  -  diag(P*, . . . ,P*) 

is  a  weighted  least  squares  estimate  with  weight  proportional  to  P* .  Let  "*"  denote  the 
joint  distribution  of  (P^)"  under  a  resampling  procedure.  The  expectation  under 
repeated  sampling  according  to  the  given  resampling  procedure  is  denoted  by  E». 
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Assumptions  on  the  resampling  procedure  ♦ i 
*  * 

(A)  B# (  II  p )  -  a  >  0,  independent  of  the  subset  (i. ik )  of  size  k,  k  ■  #  of 

j-1 

parameters  in  (3.5). 

It  is  easy  to  see  that  (A)  is  implied  by  (B). 

(B)  1.  The  n  random  variable  {P^}"  are  exchangeable. 

2.  Prob. ( support  size  of  p*  >  k)  >  0,  where  the  support  size  of  P*  is  the  total 
number  of  i's  with  P*  >  0. 

It  will  be  shown  after  theorem  1  that  several  important  resampling  procedures  satisfy  the 
assumption  (B). 

Our  first  major  result  states  that  the  full-data  LSE  8  is  a  weighted  average  of  the 

*  |  If  t  | 

resampled-data  LSE's  8  with  weight  proportional  to  ) X  D  X)  for  any  resampling 
procedure  satisfying  (A). 


A 

Theorem  1.  Tor  any  resampling  method  *  satisfying  the  assumption  (A),  the  LSE  8  based 
on  the  full  data  can  be  represented  as 


(3.8) 


i  T  *  i  * 

e,|x  d  x|8 

s.lxVxl 


where  ] XTD*x|  8  is  defined  to  be  zero  if  XTD  X  is  singular. 


Proof:  First  consider  the  D  with  nonsingular  X  D  X.  Since  8  is  the  solution  to  the 

T  *  *  T  *  fch 

equation  (X  D  X)B  «  X  D  y,  from  Cramer's  rule  (Noble,  1969,  p.209),  the  J  element  of 


8  is  equal  to  the  ratio  of  the  determinant  of  the  matrix  obtained  by  replacing  the  j 
column  of  xtd*X  by  the  vector  xTD*y  over  the  determinant  of  XTd  X.  Notationally, 


th 


*  .  lxVx(j)(v)l 
3  ixVxl  * 

where  X*^(y)  is  the  n  *  k  matrix  obtained  by  replacing  the  Jth  column  of  X  by  y. 
This  establishes 

(3.9)  |xTD*X(1,(y)|  -  |xVx|8*  for  nonsingular  XTD*X  . 
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For  singular  XTD*X 


(3.10)  |xTD*X*^  (y)  |  -  0 

follows  from  Lsnaa  3.  From  (3.9)  and  (3.10)/  the  jth  element  of  the  right  hand  side  of 
(3.8)  equals 


gt|xVx(3)(y)|  e.  l'  l*„l  l°*l  Ix^3)(y)| 

s.txVxl  e.  I *  !x,|2|d*| 


where 
of  D* 
Since 
equals 


£'  is  summation  over  a  in  S^,  D*  «  diag{P^, . .  .,P^)  is  the  diagonal  submatrix 

corresponding  to  the  subset  s  -  (i.,, . . .  ,ik)  and  (3.11)  follows  from  Uma  1(1). 

•  k  * 

**|d  |  -  E#(  II  P,  )*»),>  0  independent  of  a  from  the  assumption  (A),  (3.11) 
1*1  1 


£■  l*.l  lx<j)(y)l 

£•  I*.!2 


LL. 

which  is  equal  to  the  J  element  of  0. 


The  proof  is  completed. 


An  important  resampling  procedure  that  satisfies  (B)  is  the  bootstrap  (Efron,  1979). 

A  simple  random  sample  of  fixed  sice  m,  is  drawn  with  replacement  from  the 

observed  sample  zj,...,sn.  Let  pj  «  #(Zj  “  *i#  j  -  1,...,m).  Then  P*  «  (p*,...,p*) 
follows  a  multinomial  distribution, 

(3.12)  P*  *  Mult  (n,  -  1),  1  -  (1,...,1) 

b  n  'v  ** 

with  m  independent  draws  on  n  categories  each  having  probability  1/n.  For  m  >  k, 

(3.12)  satisfies  (B)  and  the  representation  result  of  Theorem  1  applies  to  the  bootstrap 
method.  Note  that  the  bootstrap  sample  else  m  need  not  be  the  same  as  the  original 

sample  else  n. 

Another  example  is  the  jackknife  method,  which  computes  the  LSE  by  deleting  any  subset 

of  else  d  or  equivalently  by  retaining  any  subset  of  size  r  -  n-d.  The  jeckknife 

resampling  with  fixed  size  r  is  dsflnsd  by 

.  -1 

(3.13)  Prob,^  -  1  for  l  €  s,  -  0  for  i  t  •>  *  (")  for  all  s  8  Sr  . 

* 

let  0  denote  the  LEE  based  on  the  subset  of  observations  z,  ,  i,  e  s, 

S  J 


s 


where  y  *■  ( y y,  ) 1 .  As  a  special  case  of  Theorem  1,  we  have 

8  il  lr 

Theorem  2.  For  any  r  >  k. 


1  |xTx  Ig  r  |xTx  js 

„„  '  S  8s  L  1  S  s'  S 


s  es 


a  es 


(3.15) 


r  i*>.i 


aes. 


r-k' 


where  I xTX  |8  is  defined  to  be  zero  for  singular  X^X_ . 

s  s  s  88 

The  second  identity  of  (3.15)  follows  from  Lemma  1(ii).  Theorem  2  is  the  extension 
anticipated  in  the  beginning  of  this  section. 

In  general  the  bootstrap  and  the  jackknife  provide  different  representations  of  B . 
But  when  the  bootstrap  sample  size  is  k,  |X  D  X|  -  0  for  any  bootstrap  Bample  with 
support  size  less  than  k.  The  remaining  bootstrap  samples  with  support  size  k  are 
identical  to  the  jackknife  samples  of  size  k.  Therefore  the  bootstrap  resampling  and  the 
jackknife  resampling,  both  with  resample  size  k,  give  the  same  representation  of  0. 

Theorem  2  was  proved  by  Subrahmanyan  (1972)  for  the  special  case  r  »  k  and  extended 
to  general  r  by  Hberl  and  Kennard  (1980).  (They  were  kindly  brought  to  my  attention  by 
D.  B.  Rubin  and  W.  Y.  Tsai.)  The  singularity  problem  of  X^XS  was  not  rigorously  handled 
in  their  theorems  and  proofs.  Hie  same  problem  will  surface  again  in  variance  estimation 
(Section  5),  where  such  a  negligence  will  lead  to  incorrect  results. 

The  subset  LSE  $  is  related  to  the  full-data  LSE  B  by  (Bingham,  1977;  Cook  and 
8 


Weis berg,  1982,  p.  136) 
(3.16)  0 


*  t  -It 

$  -  (X  X)  X  r 


8  8,8 


B  -  <XTX  )_1XTr 
88  s  s 


where  s  is  the  complement  of  s  and 


(3.17) 


r_  »  y_  -  X_8 
s  s  s' 

r  «  y  -  X  B 
—  —  —a 

a, a  a  s 


are  the  vectors  of  residuals  in  s  from  fitting  the  full-data  LSE  8  and  the  subset  LSE 

B  respectively.  Theorem  2  can  be  restated  in  terms  of  the  residuals  r_,  r_ 

9  _  s  s ,  s 

associated  with  the  discarded  subsets  s. 


defined  In  (3.17) 


(3.18) 


a  a, a 


•esr  •  »>• 


r  ixIxkxX’-a 

s8^  a  a 


0  , 


where  the  terms  with  singular  XgXg  ara  defined  to  be  aero. 

For  r  »  n- 1  (delate  -  1  jackknife),  (3.18)  reducea  to  the  familiar  normal  equation 
T“ 

£  “  0,  -  x^P.  Formula  (3 .18)  may  be  uaeful  in  regression  diagnostics  when  the 

diagnostic  statistics  involve  deleting  more  than  one  observation. 

Theorem  2  can  be  trivially  extended  to  the  weighted  least  squares  estimators,  let 
H  -  diaglu^, . . • ,un)  be  the  diagonal  matrix  with  elements  u^  >  0  and  Na  be  its  square 
submatrix  corresponding  to  the  set  s.  Let  the  full-data  weighted  LSE  and  subset  weighted 
LSE  be  denoted  by 


*•  T  -1  -IT  -1  ** 

(3.19)  8  -  (X  W  X)  X  W  y,  8g  - 

_  1/ 

By  applying  the  transformation  W  2  to  X  and 
Corollary  2  follows  from  Theorem  2. 


T  -1  -IT  -1 

(*,w,  V  \\  y„ 

-V, 

y,  and  ft  2  to 


X. 


and  ys. 


Corollary  2.  For  r  >  k, 


(3.20) 


i  \$;WZ 

a«Sr  “  *  "  8 

I  l*V\l 

seS  •  *  ■ 


T  —  1 

where  the  terms  with  singular  X'Wg  Xg  are  defined  to  be  zero. 

*• 

Formula  (3.20)  for  r  «  k  is  of  particular  interest,  since  Bg  is  identical  to  the 

*  -1  -1  "• 
unweighted  LSE  -  Xg  yg  if  Xg  exists.  Therefore  the  weighted  LSB  0  is  a  convex 

combination  of  the  unweighted  LSB's  6g  based  on  the  subsets  of  size  k.  As  a  conse¬ 
quence,  the  collection  of  the  weighted  LSE's  with  any  positive  weight  matrix  is  contained 
in  the  bounded  convex  hull  spanned  by  the  finite  number  of  unweighted  LSE's  based  on  all 
subsets  of  size  k.  Rubin  (1978)  proved  this  result  and  noted  its  use  in  proving  the 
convergence  of  certain  iterative  reweighted  least  squares  algorithms  as  was  later  done  in 
Dempster,  Laird  and  Rubin  (1980). 
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Koenker  and  Bassett  (1978)  introduced  a  concept  of  "regression  quantiles"  for  the 
linear  nodels  as  a  natural  generalisation  of  the  r  "inary  sample  quantiles  for  the  location 


model.  As  a  special  case,  the  least  absolute  deviation  estimator  is  the  regression 
median.  It  turns  out  that  the  aet  of  regression  quantiles  is  identical  to  the  convex  hull 
of  0^  with  s  in  (Theorem  3.1,  Koenker  and  Bassett,  1978).  This  connection  suggests 
that  the  representation  (3.15)  may  be  relevant  in  robust  regression.  In  the  next  section  a 

class  of  robust  regression  estimators  will  be  proposed  by  exploiting  this  representation. 

4.  An  application:  some  new  robust  regression  estimators 

As  in  the  previous  section  we  shall  use  the  simple  linear  regression  model  (3.1)  to 
illustrate  the  main  idea.  The  least  squares  estimator  5  of  the  slope  parameter,  being  a 
weighted  average  of  the  pairwise  slopes  6^  (3.4),  is  not  robust  in  the  sense  that  it  can 

be  heavily  influenced  by  a  few  extrema  values  of  y.  Theil  (1950)  and  Sen  (1968)  suggested 
the  (unweighted)  medians  of  0^  as  a  more  robust  estimator.  Jaeckel  (1972)  considered 
the  weighted  medians  of  P^.  Jaeckel  (1972),  Scholc  (1978),  and  Slavers  (1978)  proved 
that  an  asymptotically  optimal  choice  of  weight  is  c|xj  -  XjJ.  Note  that  lxj  “  xt|  is 
different  from  the  weight  (x^  -  x^)2  in  the  representation  (3.3).  From  the  optimality 

2 

property  of  the  least  squares  estimator,  the  weight  (Xj  -  x^)  is  optimal  among  all 
linear  unbiased  estimators  of  0. 

a  * 

The  representation  of  0  in  terms  of  0^  suggests  a  host  of  robust  modifications. 

An  important  class  is  the  following  weighted  trimmed  regression  estimators.  Let  7  - 

* 

{(i,j)  s  1<i<J<n,  ?  x^}  and  |l|  -  t.  Order  the  0^,  (i,j)  9  7,  into 

*  ( 1 )  *( 21  *(tl  * 

0  <  0  «...  <  0  .  Let  Wj^  be  the  weight  associated  with  0  and  w^  the 

corresponding  weight  associated  with  0 * ^ .  The  (a^ .a^)  -  trimmed  regression  estimator  is 
defined  as 


tr 


(4.1) 


ra^+1 


(i> 


0U>/T2  " 

m^  +  1 


(i) 


m, 

r1 


(i> 


a 


2  ' 


In  0tr»  the  lower  100  a^A  and  the  upper  100  a^\  (according  to  the  weighted  empirical 


distribution  of  6 


:u> 


with  weight  w(1.)  of  the  pairwise  slopes  8,  .  are  trimsMd.  The 


(i)}  ”  — ~  - - -  “ij 

triammd  regression  estinator  (4.1)  covers  both  the  least  squares  estinator  and  tha  weighted 
median  estiaator  as  a?  and  vary.  From  the  above  discussion,  for  a^,  a close  to 

sero  w^  «  (Xj  -  x^)2  should  be  chosen;  and  for  cloa<  to  0.5,  wij  “  t*j  -  xi| 

should  be  chosen.  The  optimal  choice  of  w^  for  general  and  depends  on  the 

asymptotic  distribution  of  Btr»  which  is  beyond  the  scope  of  the  paper. 

Assume  k  -  (^)»  i.e.  <  Xj  for  all  i  <  j.  Die  breakdown  point  (Huber,  1981)  of 

the  unweighted  (a, a)  -  trimmed  regression  estimator  (4.1)  (with  w^  -  1)  is  computed  as 
follows.  Let  m  of  the  n  y^  values  be  perturbed.  The  percentage  of  the  pairwise 
slopes  not  affected  by  the  perturbation  is 


(“) 


(1  -a)(1  -  -*T)  ~  2f  -  f2 
n  n- 1 


f  -  ^ 
n 


which  equals  a  iff  the  percentage  of  the  perturbed  y  values  equals  1  -  / 1-a, 

(4.2)  f*  -  1  -  /i^o  , 

*  a  * 

which  is  the  desired  breakdown  point.  For  small  a,  f  ~  — .  The  breakdown  point  f  as  a 


function  of  a  is  given  in  the  following  table. 


a  I  0.5  0.4  0.3  0.2  0.1 

*  I  * 

f  I  0.293  0.225  0.163  0.105  0.051 

Notice  that  the  Theil-8en  median  regression  estimator  has  a  breakdown  point  0.293.  For 
general  weighted  trimmed  regression  estimators,  no  simple  formula  like  (4.2)  is  available 
since  it  depends  on  the  particular  weight  system.  Robust  regression  estimators  with  high 
breakdown  point  are  considered  in  Siegel  (1982)  and  Rousseeuw  (1984). 

One  can  also  consider  the  (a^.c^)  -  Winsorized  regression  estimator  by  taking  the 

“  (i) 

weighted  *a1*“2*  ”  winBOrl-1Ead  (Huber,  1981)  mean  of  B  .  Other  robust  alternatives  are 
straightforward . 

For  the  general  regression  model  (3.5)  and  the  subset  least  squares  estimates  6g 
(3.14),  a  weighted  trimmed  mean  of  B^,  s  8  Sr»  can  be  obtained  by  (i)  ordering  the 
vector-valued  B#  according  to  some  criterion,  e.g.,  the  Hahalanobis  distance,  convex  hull 
trimming  or  ellipsoidal  trimming  (Titterington,  1978),  (ii)  trimming  the  extreme  values  and 
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(ill)  taking  a  weighted  average  of  the  remaining  ones.  the  weight  can  be  proportional  to 
|x*X,|.  or  to  |xjx  |*  where  the  power  1  ie  choaen  between  0  and  1.  Another  way  is 
to  apply  the  weighted  univariate  trimming  to  each  component  of  8^  separately. 

For  the  delete-one  jackknife,  r  -  n-1,  our  proposal  results  in  the  following 
estimator 


(4.3) 


where  8***  are  the  ordered  valuee  of  8_i#  i  -  It 1)n,  B  ^  «  LSE  of  B  with  the  ith 

TT—1  A (i) 

observation  deleted,  w^j  is  the  w^  -  ij|x  X)  associated  with  B  .  (In  later 

sections  we  shall  denote  B_^  by  8 ^ ^ >  Hinkley  (1977)  proposed  a  similar  estimator,  the 
main  difference  being  that  his  "ordering11  is  on  the  weighted  pseudo-values  - 
B  n(  1-w^ ) ( B-B  ^) •  Without  a  detailed  study,  it  is  hard  to  judge  their  relative  merits. 

We  merely  point  out  that  Iq^  -  8|  >  |B_a  -  B|  if  w^  <  1-n-1.  This  follows  from  (5.15) 
and  (6.8). 

Other  regression  L-eatimatora  have  been  considered  by  Bickel  (1973),  Koenker  and 
Bassett  (1978),  Ruppert  and  Carroll  (1980).  The  Min  difference  between  our  proposal  and 
theirs  la  that  the  former  directly  involves  repeated  estimators  of  B  while  the  latter 
depend  on  the  residuals.  This  My  provide  a  clue  to  the  possible  advantages  of  our 
proposal. 

Finally  we  consider  a  class  of  robust  regression  M-estiMtor  obtained  by  minimizing 

(4.4)  £  w  n($  -  B)  , 

seS  *  8 

r 

T  X  *  * 

where  *a  My  be  proportional  to  l*sxal  or  some  other  weight  function,  n (Sg  -  6)  » 

AAA  A 

♦  (IS  -  61),  IS  -  B*  is  a  distance  measure  of  S  -  8,  and  i|>  is  a  scalar  function 

8  8  S 

discussed  in  Ruber  (1981).  For  w^  -  |xaX#|  and  n  “  Euclidean  distance,  the  regression 
M-estiMte  (4.4)  reduces  to  the  ordinary  least  squares  estiMte  via  Theorem  2.  Hinkley 
(1977)  proposed  a  similar  estiMtor  for  the  delete-one  jackknife  in  terms  of  the  weighted 
pseudo-values  Since  (4.4)  requires  more  computation  than  the  usual  M-estiMtor,  it 

can  not  be  recommended  for  practical  use  until  further  desirable  properties  are  documented. 
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5.  General  weighted  jackknife  In  regression 


Millar  ( 1974)  extended  the  ordinary  (unweighted)  jackknife  to  the  regreeaion  situation 
and  proved  aoae  asymptotic  properties .  Since  the  subset  LSE's  are  not  exchangeable, 
unweighted  jackknifes  do  not  seem  natural*  As  a  consequence,  the  acclaimed  "bias-reducing* 
property  of  the  jackknife  in  the  location  case  is  lost  here  and  the  unweighted  jackknife 
variance  estimator  is  biased  even  for  linear  parameters.  Recognizing  this  problem,  Hinkley 
(1977)  proposed  a  weighted  jackknife  and  demonstrated  its  desirable  properties.  Both  dealt 
with  jackknifing  by  deleting  one  observation  at  a  time  from  the  full  data.  In  this  section 
we  propose  a  class  of  weighted  jackknife  variance  estimators  of  the  LSE  8  by  deleting  any 
fixed  number  of  observations  at  a  time.  Jack-knife  estimation  for  nonlinear  parasmters 
will  be  considered  in  Section  7. 

Since  jackknifing  by  deleting  d  observations  is  equivalent  to  retaining  r  •  n-d 
observations,  all  the  results  of  this  section  will  be  in  terms  of  8# ,  the  LSE  based  on 
the  subset  s.  the  proposed  jackknife  variance  estimator  by  recomputing  the  LSI  for  each 
subsat  of  size  r  is 


(5.1) 


v 


J,r 


r-k+1 

n-r 


l  -  8><8  -  8>T 

seS  8  8  8  * 

r _ 


(5.2) 


( n_k  r1 

<‘r-k+1J 


*T*r1  r 

e«S  8  8  8  8 


6) 


where  are  assumed  nonsingular  for  all  a  S  Sr  and  (5.2)  follows  from  (5.1)  via 

T sum ii  1(ii).  Under  the  ideal  assumption  that  the  errors  eA  in  (3.5)  have  constant 
2 

variance  a  ,  we  prove  that  yjtr  satisfies  the  minimal  requirement  that  it  is  an 
unbiased  estimator  of  the  variance  of  8  under  the  same  assumption.  Other  properties  will 
be  taken  up  in  Sections  6  and  7. 

Theorem  3.  If  var(e)  ”  o2I  in  (3.5), 

(5.3)  E (vJ  r)  -  „VW-’  -  VSr<8)  . 

Proof:  Prom  8^  -  8  -  (x^x^’x’fy^  “  Xg8)  “  (X*X#)-1X*rB,  wh*r®  r,  -  y,  *  *,8  i*  the 


residual  vector  for  the  eet  ■ , 
(5.4) 

and  ita  expectation  ia 


T  *  *  *  t  T  m  —1  T  T  T  “1 

XX  (0  -  6)(6  -  B)  -  lx  X  (XX  )  X  r  r  X  (X  X  ) 

b  •  m  b  1  •  ■  '  a  a  a  a  a  a  a  a 


i  T  i  T  -1  T  T  -1  T  T  -1 

X  X  (X  X  )  X  (I  -  X  (X  X)  X  )X  (X  X  ) 

1  a  a'  a  a  a  a  a  a  a  a  a 


(5.5) 


,T_.  - 1 


where  Ia  la  the  identity  matrix  for  the  aet  a.  Prom  Lemma  2 
(5.5)' 


r 


and  fro*  Lama  1(ii), 


r  |xax  |(XTX)-1  -  (""{')|XTX|(XTX)‘1 

BBS  ■  •  r-K 

r 


which,  together  with  (5.5),  imply 

_T. 


*(  l  1*^.1  «•.  -  »><*«  -  ®>T>  ■  (r!kt1)l*Txl«*Tx>“1 


•85r 

and  thua  proving  the  reault. 
The  factor 


r-k+1 

n-r 


in  vJ>r  can  now  be  given  a  atatiatical  interpretation.  The  aampling  error  8-6  has 

2  T  -1  *  *  *  -1 

variance  a  (X  X)  .  Given  8,  the  ‘resampling  error*  B(  -  8  ha a  £  vJ,r  as  lts 

(weighted)  ‘reaampllng  variance*.  Due  to  the  unbiaaedneaa  reault  (5.3),  the  original 

aampling  error  8-8  and  the  reaampllng  error  Ba  -  6  have  different  atochaatic  ordera. 

The  purpoae  of  the  scale  factor  /(  is  to  make  the  two  errors  of  the  saaw  order  in  the 

following  sense, 

(5.6) 

In  fact,  we  have,  from  (5.3), 


Var(/£(8S  -  6))  -  »r(M)  +  lower  order  terms  for  all  a 


(5.6)' 


l  w  var</?(8,  -  B>)  -  Var(B-B) 


a  eS 


where  the  weight  wa  ia  proportional  to  |xaX(|.  In  particular,  for  p  -  1,  this  reduces 
to 

var(/?(8.  -  B>>  -  var(B-B)  , 
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■ 


since  wg  is  a  constant.  In  general,  if  the  weights  wg  are  uniformly  bounded  away 
from  0  and  1,  (S.6)  follows  from  (5.6)'. 

The  implementation  of  Vj,r>  r  <  n-1,  and  its  extensions  in  more  complex  situations 
(same  described  in  Sections  7  and  9)  may  be  too  cumbersome  since  f  f )  computations  of  8^ 
and  t  |  are  required,  As  in  the  bootstrap  method,  it  is  suggested  to  use  a  Monte 
Carlo  approximation: 

(i)  draw  J  subsets  randomly  without  replacement  from  Sr:  denote  the  collection  of  the 
selected  subsets  by  St 
(ii)  compute  the  variance  estimate 


I  I*!*  Its,  -  BXB  -  e>T 

r-k+i  saS  *  *  *  * 

vJ,r  *  n-r  £  |XJX  | 

IBS  “ 

When  the  subset  sise  r  equals  the  number  of  parameters  k,  the  jackknife  variance 
estimator  Vj  k  can  be  defined  without  the  additional  assumption  that  XaXB  Is  non- 
singular  for  any  s  8  Sfc.  For  s  8  Sv,  8  ■  X-  y  ,  B  -  8  “  X“' r  ,  r  in  (5.4),  and 

|xjx-|(8>  -  8)(8S  -  B)T  -  lx,l2xi1r,rs<xs*"1  “  adJ  xs  rs  rs(ad3  xs,T*  ttota  that  adJ  xs» 
the  adjoint  of  Xa,  is  always  defined  whereas  | X— | X~ 1  is  defined  only  for  nonsingular 
xt.  This  suggests  defining 


(5,7) 


J,k  (n-k)  |xTX|  ,8^ 


F  !x.I2«b.  -  "  •>’ 


(5.8) 


(n-k)  |X  x|  s8S, 


l  (adj  Xg)rs  r*(adj  Xg 


where  (5.7),  a  special  case  of  (5.2),  requires  |xs|  *  0  for  all  s  8  Sk  while  (5.8)  is 
well-defined  without  any  additional  restriction.  The  proof  of  Theorem  3  works  for  the 
variance  estimator  (5.8).  The  only  change  involves  replacing  (5.5)’  by  a  similar  Identity 
In  terms  of  tha  adjoint  matrices  (see  (2.3)).  For  Vj  k  we  can  establish  the  following 
coincidental  result,  which  also  implies  the  conclusion  of  Theorem  3. 
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Theorem  4.  When  the  eubeet  else  la  equal  to  the  number  of  regression  parameter*,  the 


jackknife  variance  eatlmator  Vj  k  (5.8),  Is  identical  to  the  usual  unbiased  variance 
estiaator 

T 

“2  T  -1  *2  ~  ~  * 

(5.9)  var  “  o*(X  X)  ,  <r  »  *r,  r  -  y  -  XB  . 

n-x 

Proof  t  Note  that 

(adj  xs,rs-  t(-1)1+J|xij;)|)1#j(rBj)j 

-«r<-»14ir.jl«iS,l>i-<l*i1,<r.,|,i  ' 

where  X^*^  “  matrix  obtained  from  deleting  the  jth  row  and  the  1th  column  of  Xg,  r#  ^  ■ 
jth  element  of  r-f  X^-*(rB)  “  **trix  obtained  by  replacing  the  ith  column  of  Xg  by 
rg.  The  last  equation  of  (5.10)  follows  from  the  usual  result  on  expansion  of  determinant 
(Noble,  1969,  p.  208).  From  (5.10),  the  (i,j)  element  of  the  matrix 
£  (adj  X^Jr^r^adj  X8)T  in  15.8)  is  equal  to 

(5.ii)  r  ixii)<r.,i  ixl3><vi  -  ix<i,T«£>x<3,'E>i  . 


where  x^Nr)  is  the  matrix  obtained  by  replacing  the  1th  column  of  X  by  the  residual 

vector  r.  since  — )  is  the  k  x  k  submatrix  of  X* * '  (r )  with  rows  corresponding 

to  the  subset  s,  (5.11)  follows  from  Lemma  1(1).  Noting  that  r  is  orthogonal  to  the 
(i)  T 

other  columns  of  X  (r)  from  the  normal  equation  X  r  »  0,  the  (i,j)  element  of 

is  £T£,  and  the  other  elements  in  its  ith  row  and  Jth  column  are  zero. 


This  gives 
(5.12) 


|x,1,T(r,X,J,(r)|  -  (-1,i+Jr\lx<1,Tx<3)| 


where  X*1*  is  the  submatrix  of  X  with  its  l*'*1  column  deleted.  From  (5.8),  (5.11)  and 


A  bootstrap  resampling  method  also  loads  to  tha  estimator  var,  (5.9).  Dataila  are 
in  Section  8. 

Theorem  4  was  proved  by  Subrahaanyaa  (1972)  for  the  variance  estlaator  (5.7)  (not  the 

aore  general  (5.8))  by  aasuatlng  | X—  |  0  for  all  s  In  Sfc.  it  la  laportant  to  distinguish 

(5.8)  froa  (5.7)  in  case  |x#|  ”  0  for  soae  a.  For  a  aubaet  a  with  |xs|  *  0,  it  is 

Incorrect  to  interpret  lxal^^a  ”  B )  ( B  -  B)T  in  (5.7)  to  be  aero  as  was  done  before  in 

the  representation  theorea.  This  is  obviously  so  since  the  aore  general  expression 

(ad}  xa)ts  rj(adj  X>)T  in  (5.8)  for  singular  or  nonsingular  Xa  is  nonnegative  definite 

and  is  in  general  nonzero  for  singular  Xa.  Such  an  incorrect  interpretation  of  (5.7)  will 

*2  T  -1 

lead  to  a  variance  estlaator  snaller  than  a  (X  X)  .  A  simple  illustration  follows. 

Consider  the  siaple  linear  regression  nodal  (3.1)  with  k  -  2.  Hie  jackknife  variance 
estlaator  Vj  2  for  the  slope  paraaeter  8  has  two  forms 

(5.13)  v  -  c  i  (*  -x 1  -  bt 

3.2  ±<j  i  j  x^ 

n  -  , 

(5.14)  -  c  l  (y  -  y  -  B(x  -x  ))  , 

i<j  3  3 

where  c  ■  (n(n-2)l||(x1-x)2)  (5.13)  cosms  froa  (5.7)  and  (5.14)  froa  (5.8).  In  terns 

A  A  A  A  A 

of  the  residuals  eA  »  y^  -  a  -  (  x^,  (a,B)  -  LEE  of  (a,B),  (5.14)  equals 

c  F  (•.-«.  >2  “  (F  <x  -  x)2)  '  (n-2)”  l  e^ 
i<}  3  1  1 

Apart  froa  the  constant  c,  the  contribution  of  the  pair  (i,j)  with  x^  -  x^  to  the 
*  '  2  2 

variance  estimate  is  (e^  -  e ^ )  -  (y^  -  y^ )  which  measures  the  variability  within  two 

repeated  runs.  Interpreting  terms  in  (5.13)  with  x^  _  Xj  as  zero  amunts  to  ignoring  the 
internal  variability  of  the  responses  with  the  aame  x  value,  thus  leading  to  under¬ 
estimation  of  the  true  error. 

At  the  other  end  of  the  choice  of  r  is  the  jackknife  variance  estimator  vj,n.i 
obtained  by  taking  every  subset  s  of  size  n-1,  or  in  a  more  familiar  language,  by 
deleting  one  observation  at  a  time.  For  each  a  in  Sn-1,  let  i  denote  the  element  not 
in  s,  and  write  Xa  -  x(i)*  *•  *h*H  adopt  the  notation  that  the  subscript  “(i)*  added 


to  a  quantity  means  "with  the  i  observation  deleted,"  and  in  a  similar  spirit,  use  v 


for  the  *delete-one"  jackknife  variance  estimator  Vj  n_j.  Prom  lxji)x(i)i  “  ( ) I xTX | > 


-  xJ(XTX)"1xi/  (5.2)  and 
(5.15) 


a  A  m  a  a 

6(i)  -  6  -  (X  X)"xiri(1-w1)‘  , 


where  H  is  the  LSE  of  6  with  the  i  observation  deleted  and  is  the 


residual,  vjri\  takes  a  simple  form 


'j(d  -  \  <wi>,S(i)  -  ®>'®(i)  -  •»’ 


(5.17) 


.  „T  - 1  n  fl  T  ,  T  -1 
(x  x)  I  —  x1x1  (X  X) 


It  turns  out  that  enjoys  a  model -robustness  property,  which  is  the  main  theme  of 

the  next  section. 


6.  Model-robustness  of  the  weighted  delete-one  jackknife  variance  estimators 

In  Theorem  3  the  general  weighted  jackknife  variance  estimators  Vj,r  are  shown  to  be 

unbiased  for  var(8)  if  the  errors  are  homoscedastic .  It  is  natural  to  ask  how  vT  _ 

j,r 

will  perform  under  violations  of  the  hososcedastl city  assumption.  Since  under  the 

2  2  * 

heteroscedasticity  assumption  var(e)  -  diagfo,,. . . ,o  ),  the  variance  of  B  is 

**  in 

(6.1)  Var(8)  -  (XTX)-1  £  o*x  x*(XTX)-1  . 

i 

From  comparing  (5.17)  and  (6.1),  it  seems  that  Vj( 1 j  is  robust  for  estimating  Var(B), 

(6.1) ,  under  the  broader  heteroscedasticity  assumption.  This  remarkable  aspect  of  both 

Vjj  1 j  and  a  related  variance  estimator  will  be  treated  in  this  section. 

The  asymptotic  computations  (as  n  becomes  large)  will  be  done  under  one  or  several 

of  the  following  assumptions. 

(C)  1.  bet  x_  denote  the  X  matrix  in  (3.5)  for  n  observations. 


f 


T  T  -1  c 

max  x. (X  X  )  x.  <  — ,  c  independent  of  n. 
1<i<n  1  "  n  1  n 


2.  max  <7  <  •  . 

1<i<«  1 


1  T 

3.  The  minimum  and  maximum  eigenvalues  of  —  XX  are  uniformly  bounded  away  from 

n  n  n 

0  and  «. 

4.  The  elements  of  XR  are  uniformly  bounded. 


From  comparing  (5.17)  and  (6.1),  the  unbiasedness  of  v 


J(1) 


for  estimating  Var(0) 


2  2 

hinges  on  the  relation  ErJ  «  ( 1-v^ )o^.  Conditions  for  its  validity  or  approximate 
validity  are  given  in  the  next  lemma. 


(6.2) 

then 

(6.3) 


4.  If 


-  xJ(X*X)“*Xj  -  0  for  any  i,J  with 


&£  -  ,1-wi)0i  '  wi  “  *i<xTx>-1*i 


More  generally,  under  the  assumptions  Cl  and  C2, 


(6.4) 


ri  “  +  0(n~  )r 


where  the  big  O  -  notation  0(n-1)  denotes  terms  of  order  n” 1 . 

T*  T  T  “1  T 

Proof:  Prom  r^  "  y^  -  Xj0  -  e^  -  x^X  X)  X  e, 

,6.5)  *rj  - . i  -  2w -  (i -vj  ♦  -  «;>  . 

T  T  -1  p  a 

where  w^.  -  x  (X  X)  x  and  the  second  equality  of  (6.5)  follows  from  w^  -  J,  w*  . 

1  1-1 


It 


is  now  obvious  that  (6.3)  follows  frost  (6.2).  Assuming  Cl  and  C2, 

|T  “lj^j  “  «2>|  «  2  (max  oj)  £  w^  -  2(max  ej)^  , 

which  is  of  order  n"1.  Therefore  (6.4)  follows  from  (6.5). 

By  comparing  (5. 17)  and  (6.1),  the  following  result  is  obtained  as  a  direct 
consequence  of  Leasts  4. 
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Theorem  5.  Under  (3.5) 

(i)  EVjct)  “  v«r ( 8 )  under  (6.2), 

*  -1 

(ti)  evJ(1)  -  Vfer (8) ( 1  +  0<n  ))  under  (Cl  -  C2>. 

We  ere  not  able  to  prove  a  similar  result  for  the  more  general  Vj  r •  He  conjecture 
that  vj>r  !■  also  robust  in  the  above  sense  for  r  close  to  n.  This  is  confirmed  in 
the  sismlation  study  of  Section  10. 

The  assumption  C2  is  weak j  Cl  is  also  reasonable  since  it  is  easy  to  show  that  it  is 
implied  by  C3  and  C4.  C3  says  that  grows  to  infinity  at  the  rate  n.  usually  a 

.1  m 

stronger  condition  like  n  x^xn  converging  to  a  positive  definite  matrix  is  assumed 
(Miller,  1974).  On  the  other  hand,  (6.2)  is  a  more  restrictive  assumption.  Let  q  be  the 
number  of  different  a^'s  in  (6.2).  Then  the  linear  model  (3.5)  can  be  rewritten  as 
(6*6)  ylk  -  x^B  +  eiJt,  Be1Jt  -  0,  var  elk  -  o*,  k  «  1  { 1  )ni#  i  -  KDq  , 

with  uncorrelated  errors.  Let  T1  be  the  subspace  spanned  by  xik,  k  -  1(1)^.  According 
to  (6.2)  T^»  i  “  1(1)q,  are  orthogonal  to  each  other  with  respect  to  the  positive 

definite  matrix  (XTX)-1.  Writing  dim(T1)  ”  dimension  of  in  Rk,  we  have 

i  dimfTjJ  -  k  since  ,  i  *  1(1 )q,  generates  the  column  space  of  X,  whose  dimension 
la  k.  A  special  case  of  (6.6)  is  the  k-sample  problem  with  unequal  variances.  Let 
xlk  -  Xj  for  k  »  KDn^.  Then  x^,  i  -  KDq,  are  orthogonal  to  each  other 
w.r.t.  (XTX)_1,  which  forces  q  -  k.  By  writing  6^  -  x^B,  (6.6)  beccoea  the  k-sample 
problem 

(6.7)  ytk  -  6a  +  eik,  Belk  -  0,  Var  elk  -  o*,  k  -  KDn^  1  -  1(1)k  . 

T 

The  k  x  k  matrix  Z1  •  [xf,...,xk]  is  nonsingular  and  9  -  (6^,...,®^)  -  ZB  is  a 

reparametrisation  of  B.  We  shall  coma  back  to  (6.7)  later. 

Closely  related  to  our  Vjj ^  is  a  weighted  delete-one  jackknife  method  first 
considered  by  Hinkley  (1977).  His  approach  ia  via  the  construction  of  pseudo-values  in  the 
hope  that  the  nice  properties  of  pseudo-values  in  the  location  modal  would  carry  over  to 
the  regression  model.  Specifically,  define  the  1th  weighted  pseudo-value 

(6.8)  Qi  -  B  +  n(1-wi>(B-B{i))  -  8  ♦  n(XTX)"1xir1,  w£  in  (6.3)  . 
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Mot*  that  differs  from  the  unweighted  pseudo-value  In  that  the  weight  (1-w^)  is 

A  A 

attached  to  and  that  (1-w^)  i*  proportional  to  Kinkley  (197?) 


pointed  out  that 
(6.9) 


8  -  n~  f  0.  • 

1  1 


The  right  hand  expression  of  (6.9)  is  the  usual  jackknife  point  estimator  in  terms  of  the 
pseudo-values.  (6.9)  is  also  a  special  case  of  the  general  representation  (3.15).  He  then 
defined  the  jackknife  variance  estimator  in  terms  of  Qi 

—  1  *  *  T 

VH(1>  "  {n(n-k)}  r  (Q1-8)(Ci-ft) 


(6.10) 


(6.11) 


n  (1-w  )  .  -  .  *  T 

-  I  — =}-  <8(i)  -  8)(B  -  S)T 

1  1-n  k  '  '  '  ’ 


T  -1  n  1  T  T  ■ 

(X  X)  i  -  X^tx  X) 


as  a  direct  extension  of  a  similar  definition  in  the  location  case.  From  comparing  vH 


(6.11),  and  vJ(1)  (5.17)  it 


that  vh(^)  is  also  robust  in  the  sense  of  Theorem 


5(ii) .  The  comparison  is  however  more  favorable  to  the  latter.  Under  the  ideal  assumption 

var(e)  ■  o2l,  Er2  ■  (1-w^o2  +  (1-n  \)a2.  Therefore  Evh(1j  ?  o2(XTX)  *,  although  under 

2  T  -1  -1  2 

(Cl)  the  difference  is  of  lower  order,  i.e.,  Evg(1)  -  a  (X  X)  (1+0(n  ))  since  ErJ  - 

-12-1  2  -1 

(1-n  k)o  -  (n  k  -  w^Jo  »  0(n  )  under  (Cl).  Under  the  broader  assumption  Var(e)  » 

diag(C2,...,o2),  Ev  ?  var(B)  -  (X^)*1  l  ofx  xJ(XTX>”  ’  even  under  the  restriction  (6.2) 
l  It  H(  I  /  ^  ill 

of  Theorem  5.  as  in  Theorem  5(11),  vHM.  is  approximatsly  unbiased  under  (Cl  -  C2), 


i.e.  EvB 


var(B)(l  +  0(n  )).  This  is  because 


IT  1  — V 

i  i  2  -12  -1 

—  -  -37-  cA  +  0(n  >  -  9  ♦  0(n  ) 

-n  k  1-n  k 


where  the  first  equation  follows  from  Lemma  4,  (6.4),  and  the  second  equation  follows  from 
(Cl).  The  results  concerning  ere  summarised  in  Theorem  6. 

2 

Theorem  6.  (1)  Under  Var(e)  ■  o  I  and  (Cl),  EvHj f  var(B)  but 

Ev„, . .  -  var(8)(1  +  0(n_1 ) )  . 
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r»is»i«v«yrir . 


(11)  Older  Var(e)  ”  diagfc2 , . . .  ,<J2 )  and  {Cl  -  C2), 

1  n 

Evh(1)  -  Var  ( S )  ( 1  +  0(n-1))  . 

Unlike  Vjj  1  j  the  exact  unbiasedness  “  var($)  doea  not  hold  true  even  in 

apecial  cases .  In  the  simulation  of  Section  10,  v]](i)  is  found  to  be  more  biased  than 

other  estimators  for  both  equal  and  unequal  variances.  Theorem  6  is  a  more  rigorous 
version  of  what  is  essentially  in  Hinkley  (1977).  The  strong  consistency  of  vh({)  was 
established  in  Hinkley  (1977,  Lemma  2  of  Appendix)  by  following  Miller's  (1974)  proof  for 


the  balanced  jackknife.  The  strong  consistency  of  v. 


can  be  established  in  a  similar 


Standard  asymptotic  justifications  of  the  jackknife  variance  estimators  are  in  terms 
of  its  consistency  and  the  normality  of  the  associated  t-statistics.  They  confirm  that  the 
jackknife  method  works  asymptotically  as  well  as  the  classical  6-method,  Then,  why  should 
the  jackknife  be  chosen  over  the  6-method  except  possibly  for  computational  or  other 
practical  reasons?  The  "robustness"  of  Vjj  ^  and  vRj 1 j  (Theorems  5(ii)  and  6(11)) 
against  the  heterogeneity  of  errors,  first  recognised  in  Hinkley  (1977),  is  a  truly  fresh 
and  important  property  of  the  jackknife  methodology.  (Practically  speaking  this  advantage 
should  be  put  in  the  context  of  nonlinear  estimation.  Sections  7  and  9.) 

To  further  appreciate  this  point,  let  us  consider  the  robustness  aspect  of  the  usual 

/s.  *2  T  -1  2  2 

variance  estimator  var  «  a  (X  X)  .  For  Var(e)  -  diag(o ,, . . . ,c  ),  from  (6.5) 

i  n 

*2  S  Wi  2  -2 

*  “  \  ffi  "  0  * 


Therefore 


is  equal  to 


if  max|o.  -  o 
i 


E  var  »  c2(XTx)  1 


Var ( 8 )  ( 1  +  0(n_1 ) )  -  <XTX)_1  f  (a*  +  0(n*1))xix^(XTX)_1 


|  -  0(n  ),  or  equivalently. 


2  2-1 

( 6 e  13)  max  0  -  min  o.  -  0(n  )  , 

1<i<n  1  1<i<n  1 

2  2 

since  o  is  a  weighted  average  of  0^.  The  condition  (6-13)  is  sufficient  for  the 


I 


robustness  of  var  in  the  sense  of  Theorem  5(ii).  However  the  result  is  quite 
uninteresting  since  (6.13)  forces  the  variances  to  be  nearly  equal  for  large  n.  A 
detailed  comparison  of  Vjj 1 j ,  vHj 1 j ,  var  and  other  bootstrap  variance  estimators  for  the 
2-sample  problem  will  be  given  in  Section  8. 

To  close  this  section,  we  shall  make  two  other  remarks. 

1.  Tukey's  reformulation  of  Quenouille's  jackknife  in  terms  of  the  pseudo-values  works  well 
for  the  i.i.d.  case.  Its  extension  to  the  non-i.i.d.  situations  may  lead  to  less  desirable 
results  as  is  evidenced  by  the  slight  inferiority  of  vH(^)  to  v^.,).  A  more  striking 
example  is  offered  in  the  context  of  inference  from  stratified  samples.  Two  jackknife 
point  estimators  have  been  proposed  in  terms  of  some  properly  defined  pseudo-values,  both 
of  which  reduce  to  the  usual  jackknife  point  estimator  in  the  unstratified  case.  It  was 
found  recently  (Rao  and  Wu,  1983a)  that  neither  estimator  reduces  bias  as  is  typically 
claimed  for  the  jackknife.  On  the  other  hand  a  truly  bias-reducing  jackknife  estimator  was 
not  motivated  by  the  pseudo-values. 

2.  Ms  suppose  that  the  purpose  of  jackknife  variance  estimation  is  to  aid  the  point 

A 

estimator  B  in  making  inference  about  0*  The  variance  estimators  are  then  required  to 
be  nonnegative  and  almost  unbiased.  However  in  situations  like  the  determination  of  sample 
size,  the  variance  itself  is  the  parameter  of  primary  Interest  and  other  risk  criteria  like 
the  mean  square  error  (USE)  will  be  more  appropriate.  In  this  context,  a  nonnegative 

biased  estimator  (J.  H.  K.  Rao,  1973)  and  MINQUE  (C.  R.  Rao,  1970)  (which  may  take  negative 

•1  2 

values)  have  been  proposed.  Horn,  Horn  and  Duncan  (1975)  proposed  (1-w^)  ri#  which 

2 

appears  in  Vjj^,  (5.17),  as  an  estimator  of  and  called  it  AUE  (almost  unbiased 

estimator).  The  USE  of  (l-w^)”1^  was  shown  to  be  smaller  than  that  of  MINQUE  in  a  wide 
range  of  situations  (Horn  and  Horn,  1975).  It  is  difficult  to  extend  this  comparison  to 
estimation  of  the  variance-covariance  matrix. 

7.  Jackknifing  for  nonlinear  parameters 

8o  far  we  have  confined  our  study  of  the  jackknife  to  the  linear  parameters  as  an 
important  test  case.  Their  utility  as  a  practical  tool  is  more  appreciated  in  the  complex 
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situation*  where  no  exact  results  ere  available.  In  this  section  we  first  consider  a  simple 
nonlinear  situation.  The  parameter  of  interest  8  ■  g(B)  is  a  nonlinear  function  of  the 
linear  parameter  8  in  the  model  (3.5).  The  natural  estimator  of  8  is  8  -  g(B).  In 
this  and  the  next  section  we  will  consider  variance  and  interval  estimation  for  8 .  Bias 
reduction  of  6  will  be  considered  in  Section  9.  Extensions  to  nonlinear  regression 
models  will  be  briefly  outlined  later. 

A  natural  extension  of  the  general  weighted  jackknife  variance  estimator  Vj>r, 

(5.1),  for  the  nonlinear  estiisator  8  ■  g(B)  is 


(7.1) 


VJ,r<0) 


l  ix.X  1(8  -  8)(9  -  8) 

r-k+1>eSr 

n_r  l  |X*X  I 

seS  *  * 

r 


* .  T 


where  8^  »  ql&g)  and  B(  is  assumed  to  exist  for  any  s  8  r.  Another  extension  of 
vJ,r  18 


(7.2) 


(7.3) 


VJ,r(9) 


f  |X*X  1(6  -8)(6  -8,' 
*esr 

sesr 


9  -  g(  B  ) ,  B. 


+  i(; 

n-r  b 


Both  can  be  implemented  by  Monte  Carlo  approximation  as  in  the  linear  case.  In  (7.2)  the 

A  A 

scale  factor  ^r-k+1/^n-r  is  applied  internally  to  B^  -  B ,  while  in  (7.1)  it  is  applied 

externally  to  8^-8  after  the  transformation  g.  Under  reasonable  smoothness  conditions 

on  g,  both  v  (8)  and  v  (8)  will  be  close  to  the  linearization  (or  6-method) 

JfT  J|I 

variance  estimator 

(7.4)  ^  vlin-9,(®>Tvj,r9,<0)  '  _ 

where  g’ (B)  is  the  derivative  vector  of  g  evaluated  at  6.  For  variance  estimation 


there  is  perhaps  little  difference  in  choosing  between  v  (8)  and  v  (6).  The 

JfT  J  t T 

internal  scaling  (7.3)  turns  out  to  be  instrumental  in  the  following  construction  of  the 
jackknife  distribution  based  on  repeated  sampling  of  subsets  of  size  r: 

(i)  draw  subsets  8^,...,Sj  randomly  without  replacement  from  Sf/ 
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i 


(ii)  construct  a  weighted  empirical  distribution  function  CDrJ(t)  based  on  g(0  I,  6 

*i  * 

defined  in  (7.3),  i  »  1(1)J,  with  weight  proportional  to  1x^*8^* 

Similar  to  Efron's  (1982)  bootstrap  percentile  method  is  the  jackknife  percentile 
method  consisting  of  taking 

(7.5)  (CDPJ-’ta)  ,  CDFj“1(1-a>] 

as  an  appropriate  1  -  2a  central  confidence  interval  for  6.  Since  ^D^J(t)  is  a 
discrete  function,  (7.5)  is  computed  with  a  continuity  correction.  For  multiparameters 
9,  a  confidence  region  can  be  similarly  constructed  once  the  shape  of  the  region  is  deter¬ 
mined.  Efron  (1982,  Chapter  10)  considered  the  smoothed  percentile,  bias-corrected 
percentile  and  bootstrap  t  as  modifications  of  the  bootstrap  percentile  method.  The  same 
idea  can  be  applied  to  the  jackknife  percentile  method  in  a  straightforward  manner.  It  is 
more  natural  to  apply  the  internal  scaling  (7.3)  since  6  is  the  center  of  the  weighted 
distribution  of  (5^  due  to  the  representation  result.  Theorem  2,  while  9  may  be  shifted 
from  the  center  of  the  weighted  distribution  of  9^  due  to  the  nonlinear  distortion  g. 

For  this  reason  we  think  (7.2)  may  be  more  natural  than  (7.1).  The  issue  of  internal  or 
external  scaling  also  arises  in  the  context  of  bootstrap  inference  from  stratified  samples, 
where  it  is  found  that  a  standard  bootstrap  method  involving  a  single  external  scale 
adjustment  gives  rise  to  incorrect  variance  estimate  (Efron,  1982;  Bickel  and  Freedman , 
1984),  since  the  corresponding  internal  scales  vary  from  stratum  to  stratum.  This 
observation  has  led  Rao  and  Wu  (1983b)  to  construct  a  valid  bootstrap  method  by  applying 
the  internal  scale  factor  within  every  stratum  before  applying  the  transformation  g. 

In  a  more  complex  situation  like  the  nonlinear  regression  model 

(7.6)  y±  -  f± (0 )  ♦  ei  , 

where  f^  is  a  nonlinear  smooth  function  of  9  and  the  error  e^  satisfies  the 
assumptions  in  (3.5),  the  jackknife  variance  estimator  yj,r.  (5.1),  has  a  natural 
extension,  namely,  to  replace  X^X^  by  \  f^(0)f|(B)  with  i  summing  over  s  and  to 

A  A 

interpret  9  and  0^  as  the  nonlinear  least  squares  estimates  based  on  the  full-data  and 
the  subset  s,  f^{0)  is  the  vector  of  derivatives  of  f^  with  respect  to  9*  We  may 
consider  alternative  weight  functions  to  avoid  the  computation  of  f {  or  by  evaluating  it 


at  other  estimates  6 


.  Another  approach  that  requires  less  computations  ms  proposed  by 
Fox,  Hinkley  and  Larntz  (1980).  Confidence  intervals  can  be  constructed  frosi  the  jackknife 
histogram  based  on  6#,  (7.3),  with  the  weight  function  discussed  above.  Their  properties 

and  a  similar  extension  to  the  generalized  linear  models  will  be  reported  later. 

The  term  "jackknife"  is  commonly  identified  in  the  literature  with  the  delete-one 
jackknife.  According  to  Efron's  (1982)  simulation  results,  the  delete-one  jackknife  does 
not  in  general  perform  as  well  as  the  bootstrap.  Me  think  there  are  two  reasons  for 


this.  First,  the  delete-one  point  estimate  6 


(i) 


g(6;  ,  j)  is  too  close  to  8  »  g(6)  to 


reflect  the  true  variability  of  8  -  8  «  g(8)  -  g(R).  The  linearly  adjusted 
/n-k  (8^^-8),  though  correct  to  the  first  order,  does  not  take  account  of  the 
nonlinearity  that  the  function  g  has  undergone  between  8  and  R,  since  8  -  8  is  of 


larger  order  of  magnitude  than  6 


(i) 


8*  For  nonsmooth  8  like  the  median  in  the 


location  case,  the  delete-one  jackknife  variance  estimator  is  not  even  consistent.  The 
second  reason  is  that  the  delete-one  jack-knife  generates  exactly  n  resampled  estimates 


6 


(1) 


Except  for  very  large  n,  they  do  not  provide  enough  values  for  constructing 


histograms.  This  is  why  the  delete-one  jackknife  method  is  traditionally  associated  with 
variance  estimation.  The  resulting  symnwtric  confidence  intervals  of  the  form 

m  mm  m 

[8  -  to,  8  +  to]  have  a  serious  drawback,  namely,  they  can  not  reflect  the  possible 
skewness  inherent  in  the  original  estimate  6  around  8.  On  the  other  hand,  the 
histogram-based  confidence  intervals  do  reflect  the  Bkewness  in  8.  For  the  bootstrap 
method,  this  was  rigorously  established  in  Singh  (1980)  for  some  estimators. 


This  and  the  fact  that  8  -  8  and  6-6  are  of  the  same  order  of  magnitude,  where  8 

is  the  bootstrap  estimate  of  6,  perhaps  explain  the  general  good  performance  of  the 
bootstrap  histo-gram  methods  over  the  delete-one  jackknife  method. 

It  should  be  clear  by  now  why  we  propose  the  general  jackknife  method  by  deleting  more 
than  one  observation.  It  generates  more  pseudo-replicates  of  9  to  allow  for  the 
construction  of  a  histogram.  For  n  -  20,  the  delete-two  jackknife  generates  190  values 
instead  of  the  meager  20  values  given  by  the  delete-ona  jackknife.  Regarding  the 
question  of  the  choice  of  r,  we  may  choose  r  to  make  the  scale  factor  (r-k+ 1)/(n-r) 


near  one,  that  is,  r  ~  (n+k- t)/2.  This  choics  guarantees  that  9^-0  is  of  tha  sane 
stochastic  order  as  6-6  and  Makes  it  unnecessary  to  perform  the  internal  scale 
adjustment  (7.3).  For  the  location  problem  k  »  1,  choosing  r  «  ^  reduces  to  the  half - 
samp ling  procedure  (Efron,  1982,  Chapter  8).  Another  advantage  of  choosing  r  near  ( n+k 
1)/2  is  in  variance  estimation  when  the  parameter  of  interest  is  not  a  smooth  function. 
The  inconsistency  of  the  delete-one  jackknife  variance  estimator  for  nonsmooth  estimates 
like  the  median  can  be  avoided.  In  the  context  of  complex  sample  surveys,  the  balanced 
half-sample  method  of  McCarthy  ( 1966)  is  found  to  provide  more  reliable  confidence 
intervals  than  the  linearisation  method  and  tha  jackknife  method  in  the  empirical  study  of 
Kish  and  FranV  ( 1973 ) . 


8.  Bootstrap  and  subset  sampling  in  regression 

Can  the  previous  results  for  the  jackknife  be  extended  to  other  resampling  methods? 

*  * 

For  a  given  resampling  method  denoted  by  *,  8  and  D  defined  in  (3.7),  we  would  like 
to  find  a  variance  estimator  of  tha  form 
(8.1) 

where  the  weight  w«  is  proportional  to  |xTD*X|  and  -  1,  such  that  it  satisfies 

the  minimal  requirement  (as  in  Theorem  3) 

(8.2)  E(v) var(e)  -  a2 I)  -  o2(XTX)_1  . 


v  -  xE.w^e'-enB'-e)1 


The  l«ft  hand  side  of  (8s 2)  is  equal  to 


2  T  •  -1  T  *2  T  *  -1  T  -1 

lO  E#{w#(X  D  X)  X  D  X(X  D  X)  -  W#(X  X)  } 


2,  T  *  -1  T  *2  T  *  -1  T  -1 

-  \a  {E.(W#(X  D  X)  X  D  X(X  D  X)  )  -  (X  X)  }  . 


(8.3) 

The  first  tern  inside  the  curly  bracket  of  (8.3)  sees*  intractable  except  for 

(8.4)  D*2  - 


D  -  diag(P1 


which  is  equivalent  fco  P1  •  0  or  1  for  all  i.  This  prompts  us  to  consider  procedures 


satisfying  Assumption  (B)  and  condition  (8.4),  whose  defining  probabilities 

.  _  c 


(8.5)  Prob*(pJ^  *  •  Fj  -1,  remaining  Pj  ■  6)  ■  jj— 


are  Independent  of  the  subset  ( i where  c  is  the  probability  of  resampling  a 


subset  of  also  r,  ck+.  .  .+cn  “1*  When  cf  -  1,  (8.5)  reduces  to  (3.13),  the  jackknife 

with  fixed  subset  size  r.  Therefore  we  call  any  procedure  satisfying  (B)  and  (8.4)  a 


subset  sampling  procedure.  Some  may  prefer  to  call  it  a  variable  jackknife.  Back  to 

(8.3),  its  first  term  under  (8.4)  is 

1  T  *  1  T  *  -1  T  * 

eJx  0  x|  (X  D  X)  E.  adj  (X  D  X) 


eJxVxI 


eJxtd*x! 


whose  (i,j)  element  is 


_  |V(J)  _*v(i)| 

E,|X  D  X  | 

<-D  3  - =-1 - 

eJxtd  x| 


-  (-D 


n“*i 

l.esjx.12  «JD;i 


®k-1  (-1)1+3lx<3)  X(i,l  ®k-1  T  -1 

— —  J - ‘ - ± - L  -  •— —  (l,j)  element  of  (X  X) 

®k  jx  x|  “k 


where  the  expansion  in  (8.6)  is  justified  by  Lemma  1(1)  and 

<«•«>  \  -  »<*.(**  -  p* . p[  - 1)  -  r  • 

r-k 

From  (8.1),  (8.3)  and  (8.7),  for  a  subset  sampling  procedure  »,  we  have  found  that  the 
variance  estimator 

1  T  *  1  *  *  *  A  T 

°v  1  _1  *•  x  D  *  <B  -B)(B  -6) 

(a.9>  [-*=! -  ,)  - - — 

k  E#|x  D  X| 

satisfies  the  unbiasedness  requirement  (8.2).  For  the  special  case  of  jackknifing  with 

“k 

fixed  subset  site  r,  ■  "  -  1  -  and  (8.9)  reduces  to  the  jackknife  variance 

estimator  vT  _,  (5.1). 

a,r 

Note  that  the  scale  factor  in  (8.9) 

,Vl  T1  Wl  Prob.(Pk  "  ’Ip* - -  1) 

(8.10)  (-£— 1  -  1)  -  - — ~ — !— —  - £ - ^ - £— ! - 


1  —  a  /n  *  ,  * 

k'  k-1  Prob#(Pk  -  ojp, 


• 

■  1  **  j  "M 

'*■  pk-1  "  1) 

• 

0 1  p 1  “• ' 

Pk_,  -  1) 

is  a  conditional  odds  ratio  given  that  the  first  k-1  units  have  been  selected.  For 
jackknifing  with  subset  size  r,  this  alternative  interpretation  of  the  scale  factor 


(r-k+1)/(n-r)  My  b«  useful. 

* 

Among  those  resampling  procedures  that  do  not  satisfy  (8.4) ,  1 Prob#( P^  >  2  for 
sosm  i)  >  0,  we  single  out  the  bootstrap,  (3.12),  for  further  study.  Unfortunately  it 
will  be  shown  next  via  a  simple  counterexample  that  no  variance  estiMtor  (8.1)  for  the 


bootstrap  can  satisfy  (8.2)  in  general.  Consider  the  following  regression  model < 

yi1  "  91  +  *i1'  1  “  1 . nl 

(8.11)  n  -  n  +  n„ 


'i2  -  9 2  +  6i2'  1  -  ,'--*'B2 


c,.  Let  P  -  (P,,...,p„) 


with  uncorrelated  errors,  Be^  -  0  and  Var  e^  -  o  ,  Var  e^ 

be  a  resampling  vector  from  the  bootstrap  method,  (3.12),  with  the  bootstrap  sample  size 
* 

m  -  n.  Pe write  P 


•  nl  *  * 

(8.11)  and  define  n^  ■  P  ,  n2 


(8.12) 


(P.|.,,...,Pn  1,P12,...,P|1  2)  to  correspond  to  the  two  samples  of 

Then 


n2  * 

S1 V 


n.  *  Bln,  —*•),  j  -  1,2 
J  n 


is  a  binomial  distribution  with  parameters  n  and  nj/n,  and  the  conditional  distribution 

(8.13)  t  Mult  1  “  *»*» 

lj  11  )  *  3  n4 

.  1  i 

is  a  multinomial  distribution  with  n^  independent  draws  on  n^  categories  each  having 
probability  1/n.j.  Then  B*  -  n*-1  J*  P^y^  ,  j  -  1,2,  |xTD*x|  -  n*n*  and 


(8.14) 


-1  "l 

where  y1  -  ^  y^. 


IxVxIlB*-^)2  -  -§  (l1  P^lyi,-?,)}2 


■l  1 


(8.13), 


(8.15) 


where 


i  T  *  i  *  *  2,  *  n2  rl  *  —  i  * 

e#(|xtd  xllBj-B,)  |nf)  -  7  Var#([  p11(y11-y1)|n1) 

n. 


n2  ,5l  *  1  ,, 

—  11  n  -  (1 

1  w  -  .2 
•  ^>^11^1  > 

-  V 

nt  1  1 

*  * 

* 

n  n  s  n .  _ 

-2  -1  r1  (y  -7 
•  n,  ,  n  1 

2  A2 

»  -^881  ' 
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(8.16) 


^  a  t  *  T  1  ^1  i  *~2 

*.(P  -B)(B  -8)t  -  dlag( •*(  . >  ^  *.(— )  “)  . 


88. 


n2  2 


*-1|  * 


which  la  well-defined  if  n^  and  n2  >  1.  for  amall  or  moderate  nt,  |nA  >  1)  la 

not  cloaa  to  (hj  -  I)"1  and  tha  unweighted  estimator  (8.16)  ia  biaaad.  If  n,  and  n2 


*-1i  *  — t 

ara  both  large,  *,(1^  |nA  >  1)  ~  {n±  -  1)  and  (8.16)  la  alaoat  unbiaaed.  It  ia. 


however,  unclear  whether  thia  can  be  extended  to  general  linear  models  even  when  the  error 
variancea  are  equal. 

It  ia  not  aurpriaing  that  the  unweighted  bootatrap  does  not  provide  an  unbiaaed 
variance  eatiaator,  aince,  aa  in  the  caae  of  tha  unweighted  jackknife,  the  LSX’a  S  baaed 
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on  the  bootstrap  resamples  ar«  not  exchangeable.  What  is  aura  disappointing  is  the  failure 
of  the  weighted  bootstrap*  One  would  expect  it  to  perform  well  since  the  sasm  weight 
function  was  used  in  jackknife  resampling  with  satisfactory  results. 

If  <8. 11)  is  recognised  as  a  two-sample  problem  rather  than  a  regression  problem, 
unbiased  variance  estimators  can  be  obtained  by  bootstrapping  and  rescaling  within  each 
sample •  But  the  Min  point  we  have  tried  to  make  here  is  that  a  result  like  Theorem  3 
cannot  be  extended  to  the  bootstrap  method  for  the  general  linear  model  (3.5).  It  is 
however  unclear  what  will  happen  if  the  weight  w„  in  (8.1)  is  chosen  differently  from 
|xtd*x|  . 

On  the  other  hand,  the  jackknife  works  quite  well  here.  Routine  computation  gives 


J< 


88,  SS2 

1)  “  dia,(n1<n1-t)'  n2(n2-1))  ' 


and 


BV 


2  2 

J(1  )-dla</^  * 


The  latter  also  follows  from  Theorem  S(i)  since  the  model  (8.11)  satisfies  (6.2).  It  can 
be  shown  that  the  delete-two  jackknife 

vJ,n-2  “  VJ(1) 

and  that 

_  SB  SS 

VH(1)  "  7=2  —)  ' 

n,  n2 

which  is  biased  but  becomes  approximately  unbiassd  as  n.,  and  n2  become  large.  The 


usual  variance  estiMtor 


SS  i  88 

1  2  ..  rl  1  , 

- ^ .  — ) 


is  unbiased  for  -  a 2»  approximately  unbiased  for  near  a and  biased  otherwise. 

To  obtain  valid  bootstrap  variance  estiMtors,  we  can  draw  a  simple  random  sample 

{e*)“  with  replacement  from  the  "population"  {r^y/ 1-k/n}",  •  yt  -  x^B  is  the  1th 

*  T*  •  * 

residual.  Define  the  bootstrap  data  y^  -  XjB  +  e^,  i  ■  1(1)n,  by  treating  8  as  the 
true  parameter  with  the  above  "population"  of  errors,  and  the  bootstrap  LSI  is 
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(8.17) 


g*  -  (XTX)-1XTy* 


For  any  nonlinear  estimator  9  -  g(R),  the  bootstrap  variance  estimator  is  defined  as 


(8.18) 


v  -  E.(9*-fl)(9*-9)T  ,  0*  -  g(fl*),  0*  in  (8. 17)  . 


Mote  that  6  -6  is  unweighted.  When  8  ■  8,  it  is  easy  to  see  (Efron,  1979)  that 


(8.19) 


*  2  T  - 1  *  2  1r2 

var  -  a  (X  X)  ,  o  -  l 


therefore,  for  hoaoscedastic  errors  Var(e)  -  a  I,  vfa  is  a  valid  variance  estimator.  For 

constructing  confidence  intervals  for  0,  note  that  each  y*  vector  is  associated  with 

the  asms  x  matrix,  the  unweighted  percentile  method  of  Efron  (1982)  is  as  follows. 

Repeat  the  above  procedure  for  B  times.  Define  CDFB(t)  to  be  the  unweighted  empirical 

*b 

distribution  function  based  on  the  B  bootstrap  estimates  0  ,  b  -  1(1)B.  The  bootstrap 


arcentile  method  consists  of  taking 


(8.20) 


. —  -1  /X  -1 

[CDFB  (o),  CDn  (1-a)) 


as  an  approxlute  1  -  2a  central  confidence  interval  for  0.  the  interval  (8.20)  is 
computed  with  a  continuity  correction. 

2  2 

But  for  heteroacedastic  errors  Var (a)  ■  diag(o  ,...,o  ),  v  «  var  does  poorly  as 

i  n  p 

demonstrated  in  Section  6  for  the  linear  parameters,  this  should  be  quite  clear  from  the 
nature  of  the  procedure,  the  assumption  underlying  the  drawing  of  i.i.d.  samples  from 
{r^// 1-k/n)  is  that  the  residuals  r^  are  viewed  as  exchangeable,  the  firBt  bootstrap 
residual  a*  may  coon  frost  the  tenth  residual  r1Q,  and  so  forth,  the  heterogeneity 
among  r^  is  lost  in  this  mixing  process.  On  the  other  hand  the  delete-one  jackknife,  by 

retaining  the  identity  of  the  residuals,  reflects  the  possible  heterogeneity  of  ri  and  of 
2 

the  error  variance  o  . 

Recognising  the  model-dependent  nature  of  the  bootstrap  residual  method,  Efron  and 
Gong  ( 1983,  p.  43)  seemed  to  favor  the  unweighted  bootstrap  method  since  it  'takes  less 
advantage  of  the  special  structure  of  the  regression  problem. "  However,  their  next  state¬ 
ment  that  'the  (unweighted  bootstrap)  method  gives  a  trustworthy  estimate  of  0's 
variability  even  if  the  regression  model  Is  not  correct"  cannot  be  substantiated  as  one  can 
easily  infer  from  our  counterexample. 
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Sine*  the  basic  principle  of  the  bootstrap  is  to  simulate  samples  that  reseable  the 
unknown  population,  we  must  point  out  that  the  'population"  {r^/Zl-k/n}  does  not  reseable 
the  true  population  of  errors  {e^}  in  that  r^  are  aildly  correlated  with  nonconstant 
variances  ( 1-w1)/( 1-k/n)  if  VSr(e)  -  a2 X.  One  possibility  is  to  replace  the  n  values 
r j// 1-k/n  by  n  -  k  uncorrelated  residuals  s ^  with  variance  a2,  e.g.  the  BLUS 

residuals  (Theil,  1971).  If  the  errors  e^  are  assumed  to  be  normal,  e^  are  also 

*  n-k 

normal.  One  may  first  apply  a  random  orthogonal  transformation  T  to  {e^}  to  obtain 
{Ta^}"  *,  and  then  draw  i.i.d.  sample  from  {Te^"  k.  A  major  problem  is  that 

the  i.i.d.  property  of  (e^)  depends  critically  on  the  hoanscedasticity  and  normality 
assumption - 

9.  Biee  reduction 

Hie  nonlinear  estimator  9  «  g(8)  of  9  «  g(B )  has  in  general  a  bias  of  order 
n"*.  In  this  section  we  will  show  that  bias  reduction  is  closely  connected  with  the 
existence  of  an  almost  unbiased  variance  estimator.  Assuming  (C3)  and  the  continuous  third 
differentiability  of  g  in  a  neighborhood  of  0,  Taylor  expansion  gives 

(9.1)  9  -  9  +  g-(0)T(8-8)  ♦  \  (0-8)V(8 M8-8)  +  0  (n-1,5)  , 

2  P 

.  i  r  _  1  c  * 

where  0p(n  3)  denotes  terms  of  stochastic  order  n  .  From  (9.1),  the  bias  of  9 

(9.2)  B(9)  -  B9  -  9  -  -1  tr(g* (0)Vkr (6) )  +  0(n"2)  , 

where  tr  is  the  t^race  of  a  matrix.  Since  the  reduction  of  bias  of  9  amounts  to 
*  _2 

estimating  B(9)  unblaaedly  up  to  order  n  ,  we  will  focus  on  the  latter  problem  for  the 
rest  of  the  section.  Data  resampling  makes  it  possible  to  estimate  b(9)  without 
computing  the  Hessian  matrix  g*(8).  First  we  consider  the  jackknife  resampling.  Let 
8^  “  g< 6^)  be  defined  in  (7.3).  Taylor  expansion  gives 

a  a  m  ^  m  a  ^  a  m  a  ^  a 

(9.3)  9s  -  0  +  g'(B)  (Bg-8)  +  -  (B^-fi)  g'lBHB^-B)  ♦  , 

where  is  the  remainder  term.  If  the  weights  ws  are  uniformly  bounded  away  from  0 

<v  *  A 

and  1,  the  discussion  around  (5.6)  and  (5.6)'  implies  that  8^  -  B  and  0-8  have  the 
same  stochastic  order.  If  we  further  assume 

(9.4)  0  -  8  -  Op(n_0,5> 
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and  tha  continuous  third  differentiability  of  g  around  0,  we  have 


(9.5) 


Opin'1*5) 


For  the  jackknife  with  subset  size  r,  we  propose  to  consider  the  following  estinator  of 

A 

B(0), 


(9.6) 


l*T*  I 

1  s  s' 


B,  ■  7  w  (8  -9 ) ,  w  >  _ 


where  ),'  is  summation  over  s  in  Sr*  From  (9.3)  and  Theorem  2, 


(9.7) 


*j,r  '  I  +  f  »,*>, 


-  |  tr(g*(6)vJ  r)  *  T’  wa%  . 

*  -0.5  *  2 

Since  g"(0)  »  g"(0)  ♦  O  (n  )  and  E(v„  )  »  Var(0)  under  (9.4)  and  Var(e)  *  a  I, 
p  J»r  ~ 

* 

the  first  term  of  (9.7)  estimates  B(9)  unbiaeedly  up  to  order  n  ;  the  second  term  of 
(9.7)  is  of  order  Op(n-1*5)  under  the  same  assumption  that  led  to  -  Opin'1*5) 

since  w(  are  assumed  to  be  bounded  away  from  0  and  1.  This  leads  us  to  the  following 
theorem. 


Theorem  7.  Assume  (C3),  the  continuous  third  differentiability  of  g  around  0  and  that 
the  weights  wfl  are  uniformly  bounded  away  from  0  and  1.  For  hosu>scedastic  errors 
VSr(e)  -  a2I, 

K(B  )  -  B(8)  +  0(n-2)  . 

2 

We  assume  (C3)  in  Theorem  7  since  it  implies  (9.4)  under  Vhr(e)  m  a  I. 

It  is  clear  from  the  arguswnts  leading  to  Theorem  7  that,  for  any  general 
Var(e)  ”  V,  as  long  as  (9.4)  and 

*  -2 

(9.8)  Ev  -  var(0)  +  0(n  )  under  Var(e)  -  V 

Jff 

are  satisfied,  the  conclusion  of  Theorem  7  holds  true.  One  such  candidate  is  the  delete- 

2  2 

one  jackknife  variance  estimator  For  V  -  diagfo^ , . . . ,a^) ,  according  to  Theorem 

5(il) , 
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BvW4,  ”  Var  ( B )  ( 1 

J  l  T  , 

+  0(n" 

1 ) )  under 

(Cl  -  C2 ) 

-2 

-  Var(B)  + 

0(n  ) 

under 

(Cl  -  C3)  , 

where  the  second  equality  follows  from 

Var(B) 

«  0(n  1 ) 

under  (C3).  Since  (Cl)  is  implied 

by  (C3  -  C4),  we  have  the  following  corollary. 

Corollary  3.  under  the  conditions  of  Theorem  7,  (C2)  and  (C4) ,  for  heteroscedastic  errors 

Vfcr(e)  -  diag (e>2, . . .  ,o2 , , 

**  in 

(9.9)  E  BJ(1)  -  B(e)  +  0(n"2)  . 


where,  w±  -  8(i)  «  8  +  /n-k  (8(1)-8>  and 

n  ( t-w  ) 

(9.10,  . s,^.,  -  r -^T-  {9<8(i)>  -  9(B)} 


j(D 


Since  v^^  also  satisfies  (9.8)  for  V  »  diagfo^ , . . .  ,on ) ,  one  would  expect  a 
result  similar  to  Corollary  3.  Binkley  (1977)  considered  the  following  estimator  of 
B(9>, 

.  n 

(9.11)  BJ(1)  -  t  (1-«t){g(B{1))  -  g(B)} 

and  demonstrated  its  unbiasedness  (without  spelling  out  proper  regularity  conditions)  for 
the  homoscedastlc  case.  A  stronger  result  will  be  proved  next.  Consider  the  expansion 

(9.12) 


A  A  a  M  A  A  a  a  A  _  A  A  A 

9(B(1)>  -  9(B)  +  g' (8)  (B(i)-6)  ♦  2<B(i)‘P)  9"(B> (B(i)-B)  +  h(i)  , 


where  the  remainder  term  »,  "  0p(n-3)  since  g* "  is  bounded  in  a  neighborhood  of  B 


and  B 


(i) 


B  -  Op(n  ').  (9. 12)  gives 


BJ(1)  ‘Itr{9"<8WJ(1))  ♦  E  (^)V) 


(9.13) 

which  reveals  the  surprising  connection  of  the  bias  estimator  B 
of  with  its  twin  vg(i).  Both  and  VH(1)  were  motivated  by  pseudo-values 

(Hinkley,  1977).  (The  connection  seems  to  suggest  that  Vj ( y ,  18  8  more  natural  estimator 


J(1,  with  VJ(1)'  in*tead 


than  vH| 1 j . )  in 


5,  to  be  stated  later,  we  will  prove  I  ^(  l-v^n  ^  > 


°p(n”  *"5) , 


which  in  conjunction  with  (9.13)  and  Theorem  6(ii)  yields  the  following  result. 


Corollary  4.  Ahum  the  continuous  third  differentiability  of  g  near  6  and  (C2  -  C4). 

2  2 

for  heteroacedastlc  errors  Var(e)  -  dlagfo a  ), 

**  i  n 

EBjn)  «  B(fl)  +  0(n“2)  . 

The  proof  parallels  that  of  Corollary  3,  the  key  steps  being  associated  with  (9.13), 

Theorem  6(11)  and  Leona  5.  The  main  difference  between  Corollaries  3  and  4  is  that  some  of 

the  regularity  conditions  required  in  Corollary  3  are  automatically  satisfied  in  Corollary 

4.  The  reason  is  that  8^  is  much  closer  to  8  than  8^  to  8*  nils  brings  home 

the  problesi  of  choosing  between  B ,, , ,  and  B,. ...  In  terms  of  imitating  the  behavior  of 

9(8)  -  g(B)»  whose  expectation  is  the  bias  B(8),  we  prefer  B,, since  it  uses  8„, 

J(  i )  (1 ) 

and  8  whose  distance  switches  that  of  8-8  whereas  6,,.  -  6  in  B,,..  is  much 

(*)  J(  ») 

* 

smaller  than  8-8.  See  the  relevant  discussion  in  Section  7.  This  difference  will 

probably  not  be  detectable  quantitatively  unless  g  is  markedly  nonlinear.  On  the  other 

hand,  for  very  smooth  g,  heuristic  (in  contrast  to  rigorous)  computations  show  that  the 

* 

error  term  £(1a*,)H,,,  in  B_,.,,,  (9.13),  is  of  stochastic  order  n  while  the  error 

l  (i )  J(i) 

term  £  w  n  in  B,  (including  B_,,.),  (9.7),  is  of  stochastic  order  n“1,5, 

8  8  J/I  JV  i  l 

A  A 

suggesting  that  B_, ,,  is  a  better  approximation  to  B(8). 

Jl  1  ) 


5.  Under  the  conditions  of  Corollary  4, 


!(Hi>Ni),0p'n'M)  • 

where  n  is  defined  in  (9.12). 

A  A 

Proof i  Denote  8^  -  6  -  (d^)^^.  Since  g*’*  is  bounded  near  8, 

(9.14)  |r‘Wi»n(i)|  <  M  I  I  (^)|Wi<| 

J  9  A.  1  1"  I 

where  M  <  •  Independent  of  ne  Under  (C2  -  C4),  it  is  proved  in  Lemma  6  that 

A  A  ^  A  P 

X^(l)  “  “  0^(n  »•  Continuing  (9.14),  we  have 


i 

(9.15) 

for  l  -  m, 

(9.16) 


td-V-l,!)!  4  0  (n-°-5)  l  L  <1-w1,l<Jltdlttl  . 


k  n 


t,»-1  i” 1 


-V 


l  (l-w^jd^l  »  (*,»)  element  of  vJ(1)  “  0p(n  ’)  , 
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for  l  i1  n, 


(9.17) 


5 


0  (»-1 ) 
p 


follows  froa  (9.16)  and  the  Cauchy-Schwarz  inequality.  Combining  (9.15) 
the  desired  result. 


(9.17),  we  have 
□ 


Note  that  Op(n-1"5)  is  only  an  upper  bound  of  the  order  of  since 

—3  . 3  .9 

n(i j  »  0^(n  >  and  the  8U“  n  °ptn  '  terms  is  likely  to  be  of  0p(n  ). 

2  2 

T eaaa  6.  Under  (C2  -  04),  and  Var(e)  *  diaq(o . , . . . ,o  ) 

in 

*  *  -0.5 

max(6,, .  -  B)  -  0  (n  )  . 

A  (i)  p 

2 

For  Var(e)  -  o  I,  this  follows  froa  the  proof  of  Leaaa  3.3  of  Hiller  (1974).  For 
2 

unequal  o^,  the  only  change  involves  using  (6.4)  of  our  Leeau  4.  Note  that  (C2  -  C4) 
implies  (Cl  -  C3)  as  shown  after  Theorem  5.  (01  -  02)  guarantees  (6.4)  and  (03) 

guarantees  ( XTX ) ~ 1  »  0(n-1). 

For  a  general  jackknife  with  subset  size  r,  B  can  be  extended  to 

J(  1 ) 

*  r— k+1  *  * 

(9.18)  Bt  -  -  £  w  (g(8  )  -  g(8) ),  w  in  (9.6)  . 

J'r  n-r  seS  •  •  • 

r 

The  difference  between  B  and  B  Is  analogous  to  that  between  v  (0),  (7.1),  and 

J,r  j,r  J,r 

v,  (8),  (7.2).  The  former  applies  the  scale  adjustment  (r-k*1)/(n-r)  externally  and 

J#r 

the  latter  internally.  As  B  does  in  Theorem  7,  B  also  estimates  the  leading  term 

J#  r 

A 

of  B(6)  unbiasedly. 

Let  us  now  turn  our  attention  to  the  bootstrap  sampling.  Since  the  unbiased 

estimation  for  B(6)  hinges  on  the  unbiased  estimation  for  Var(8),  from  the  study  of 

Section  8,  we  need  only  consider  the  last  bootstrap  method,  (8.16)  -  (8.18),  considered 
*  T  -1  T  * 

there.  Let  8  ■  (X  X)  X  y  be  the  bootstrap  LSB  defined  in  (8.16),  where  *  denotes 
the  bootstrap  (or  i.i.d)  sampling  from  the  rescaled  residuals.  From  the  unbiasedness  of 
the  LSB, 

(9.19)  B.8*  -  8  , 

and  from  (8.19), 
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- . .  .  - - 


(9.20) 


•  *  *  *  T  *2  T  -1 

E#(0  -  6) (6  -  6)  -  0  (X  X) 

*  2 

la  unbiased  for  V«r(0)  under  Var(e)  -(3  1.  By  repeating  the  steps  (9.3)  -  (9.7)  and 
using  (9.19)  and  (9.20),  the  proposed  bootstrap  estimator  of  bias 

(9.21)  Bbo<>t  -  E.0*  -  0,  9*  -  g(0*) 
is  equal  to 

(9.22)  ^  o2tr(g*(S)(XTX)“1)  +  E,n,  , 
where  n#  ia  the  remainder  ten  of  the  expansion 

^  a  a  m  ^  a  a  ^  a  |p  a  £  a 

9  -  9  +  g*<0)  (0  -  0}  ♦  -  (0  -  0)  g"(0>(0  -  0)  +  n,  • 

From  (9.21)  -  (9.22),  we  have 

Theorem  8.  Assume  (C3),  the  continuous  third  differentiability  of  g  near  0  and 

(9.23)  E.n.  -  0  (n_1'5>  . 

*  •  P 

2 

For  hoasoscedastic  errors  Var (a)  -  o  X, 

A  A  o 

EB.  «  B(0 )  +  0(n  )  . 

boot 

This  unbiasedness  result  cannot  be  extended  to  the  heteroscedastic  case  because  of 

(9.20).  Hie  condition  (9.23)  is  a  reasonable  one  since  n*  «  O  (n-1*5)  follows  from 

*  P 

*  -ns 

0  -  B  -  Op(n  ),  which  is  a  consequence  of  (C3)  and  the  conditional  central  limit 

* 

theorem  of  0  (Freedman,  1981,  Theorem  2.2). 

10.  Simulation  results 

In  this  section  we  examine  the  Monte-Carlo  behavior  of  (i)  the  bias  of  several 
estimators  of  the  variance-covariance  matrix  of  the  least  squares  estimator,  Var(0), 

(ii)  the  bias  of  several  estimators  of  a  nonlinear  parameter  8  -  g(0),  and  (Hi)  the 
coverage  probability  and  length  of  the  associated  interval  estimators  of  the  same  nonlinear 
paraanter . 

Under  consideration  is  the  following  quadratic  regression  model > 

2 

»i  -  »0  *  ®lxi  +  ®2Xi  +  ®i  '  1  *  1M)12 

x1  -  1,  1.5,  2,  2.5,  3,  3.5,  4,  5,  6,  7,  8,  10  . 
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Two  variance  patterns  ara  considered: 

K 

L«  et  -  /  ~  N<0,1) 


Ohsqoal  variances: 


Equal  variances  :  a^  •  11(11,1) 


The  e^'a  are  independent.  For  unequal  variances  the  variance-covariance  natrix  of  the 
ordinary  least  squares  estimator  8  is 


Var(8) 


1.50 


-0.79 

0.48 


0.  Oo 
-0.05 
0.01. 


while  the  expectation  (see  (6.12))  of  the  usual  variance  estimator  var,  is 

(2.09  -0.87  0.071 


-2  T  -1 

a  (x  x) 


0.42 


-0.04 

0.00. 


Because  of  the  heterogeneity  of  errors,  the  two  matrices  are  quite  different.  For  a 
variance  estimator  v,  its  bias  is  definad  as 

B(v)  -  B(v)  -  VBr(B)  . 

Pour  variance  estimators  are  considered!  (1)  the  usual  variance  estimator  var  (5.9), 
which  is  identical  to  the  bootstrap  variance  estimator  vb  (8.18),  (2)  the  delete-one 
jackknife  variance  estiawtor  Vj^j  (5.16),  (3)  Binkley's  delete-one  jackknife  variance 
estimator  Vh(1)  (6.10),  (4)  the  retain-eight  jackknife  variance  estimator  vJ>8  (5.7), 

The  following  results  are  based  on  3000  simulations  on  a  VAX  11/780  at  the  university  of 
Wisconsin-Madison.  The  normal  random  numbers  are  generated  according  to  the  WSL  sub¬ 


routine  GGMUi.  The  easts  set  of  normal  random  numbers  is  used  throughout  the  study. 

|0.58  -0.07  -0.0$ 

-0.05 


B(var) 


0.01 

-0.0QJ 


B,VJ(1)) 


0.02 


0.03 

-0.04 


-0.00 

0.01 

-0.0QJ 


f— 0 .23 


B<TH(1))  ■ 


0.19 

-0.14 


-0.03 

0.02 

-0.0QJ 
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fO.  09  0.01  -O.Oll 


B<VJ,8> 


-0.04  0.01 

-0.00. 


In  the  unequal  variance  case,  var  la  known  to  be  biased  (6.12)  -  (6.13),  vj(i)  to  be 
almost  unbiased  (Theorea  5).  Both  are  confined  by  the  simulations.  The  robustness  of 
Vj,g  conjectured  In  Section  6  la  also  confined.  The  only  surprise  Is  the  poor  perfor- 
aance  of  ▼n(i)*  The  claiaed  robustness  (as  n  becoaes  large)  of  vHj  ^  j  In  Theorea  6 
does  not  hold  up  here.  Its  bias  la  quite  nontrivial,  nils  prompted  us  to  examine  the  bias 
behavior  of  Vg^j  in  the  equal  variance  case  since  vHj  1  j  is  the  only  one  that  is  not 
exactly  unbiased  (Theorea  6(11).  In  this  case,  var(8)  -  o2(xTx)-1  -  o2(XTX)~1.  The 
bias  of  vHj 1 j  is  not  negligible. 


’-0.13  0.07  -O.Oll 


B(VH(1)’ 


-0.04  0.00 


l.  -0.00J 

and  the  biases  of  var,  and  Vj(g  are  all  very  small  (none  of  the  entries  exceeds 

0.0102  in  magnitude).  Another  thing  to  note  is  that  all  the  diagonal  eleaents  of  vH(^) 
are  downward-biased  in  the  siaulation.  The  poor  performance  of  Tg(i)  in  both  cases 
should  cause  its  users  scow  concern  at  least  in  the  small  saaple  situations. 

He  next  consider  bias  reduction  and  interval  estimation  for  the  nonlinear  parameter 


2 

which  maximizes  the  quadratic  function  8Q  +  8^x  +  BjX  over  x.  Six  point  estimators  are 
considered,  6,  0J(1)  -  6  -  BJ(1),  (9.13),  9J(1)  -  9  -  »J(1),  9J>(J  - 

A  *  *  a  a 

9  -  B,  ,  (9.18),  9,  -  9  -  B.  _,  (9.6),  9, _  -  9  -  B.  .  (9.21).  In  drawing  the 

J,8  J ,8  J,B  boot  hoot 

bootstrap  samples,  the  uniform  random  integers  are  generated  according  to  the  IMSL 

subroutine  060D.  The  number  of  bootstrap  samples  B  is  480,  which  is  comparable  to 

495,  the  total  number  of  jackknife  subsets  of  sise  8. 

Their  average  biases  are  given  in  Table  1  for  8Q  •  0,  8^  •  4  and  several  values  of 

8  .  Bias  reduction  is  more  difficult  to  achieve  when  8.  gets  closer  to  0  since  9 
2  2 

becomes  a  more  curved  function  of  8^,  and  when  the  variances  are  unequal.  In  the  most 
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nonlinear  situation  fl  -  -0.25  and  unequal  variances,  only  0....  and  8.  achieve 
2  J(1)  DOOt 

nild  reduction  of  bias  and  other  estiMtors  in  fact  have  bigger  biases.  In  all  the  other 

situations,  the  two  jackknife  estimators  0  and  0,  _  achieve  substantial  reduction 

<J  \  1 )  J  f  8 

of  Mas.  On  the  other  hand,  the  other  two  jackknife  estimators  0,. and  9_  _  based  on 

1/  J/O 

internal  adjustment  of  distance  do  not  perform  as  well,  nils  is  consistent  with  the 
asymptotic  comparison  given  before  Imamu  5.  What  pussies  us  Is  the  unpredictable  behavior 
of  the  bootstrap  estimator  ®]xx3t  tox  ®2  “  ~0,25,  According  to  Theorem  8,  8boot 
reduces  bias  for  equal  variances  but  not  for  unequal  varisnees.  What  we  see  in  Table  1  is 
the  contrary.  It  appears  that  the  curvature  effect  is  the  dominant  factor  here. 


Table  1 . 

Biases  of 

six  estimators  of 

8 

(based  on 

3000 

simulation  samples) 

8fl  -  0, 

8,-4 

Unequal  variances 

Equal  variances 

estimator 

»2 

e 

2 

-0.25 

-0.35 

-0.5 

-1.0 

-0.25 

-1.0 

A 

0 

0.41 

0.05 

-0.02 

-0.01 

0.08 

-0.01 

6J(1) 

-0.22 

-0.01 

0.00 

-0.00 

-0.05 

-0.00 

** 

e 

Jd) 

0.63 

0.06 

0.02 

0.00 

0.02 

-0.00 

V. 

1.48 

0.00 

0.00 

-0.00 

0.01 

-0.00 

®J,8 

2.39 

0.05 

-0.01 

-0.00 

-0.08 

-0.00 

9boot 

0.16 

0.02 

0.01 

-0.00 

-0.12 

-0.00 

consider 

interval  estimation 

for  9. 

For  equal 

variances 

,  the  classical 

Fieller'e  interval  is  exact.  In  the  context  of  maximising  the  quadratic  function,  the 


exact  ( 1-2o) 

Fieller's  interval  is 

(Willis 

ms,  1959,  p.  Ill) 

(I)  (— ,-) 

if 

0-912>2  <  ('•?„)('■?„) 

(10.1) 

(II)  (— ,«L)  u  («„,-) 

if 

0-9, 2>2  >  O-g^Kl-gjj),  g22  >  , 

(III)  [e^8,,] 


otherwise 


where  9  ,  8  arc  the  smaller  and  larger  values  respectively  of 
L  U 

*  a  l/„ 

0{1-912  ±  I(1-9,j)  -  <,-911)<Hj2I]/J!/(1-9jj)  , 


(10.2) 


2  *2  ij 

toe  _ 

g..  -  -Vt -  «  (*  x)"1 

1  Vj 


tclj) 


0<i,J<2 


and  t  is  the  upper  a  percentage  point  of  a  t-distribution  with  n  -  3  (here  9) 

01 

*2 

degrees  of  freedom,  a  is  the  usual  variance  estimator  (5.9)  (by  assueting  equal 
variances).  Fieller's  interval  estimate  is  unbounded  in  the  case  of  (I)  or  (II)  of 
(10.1).  The  method  is  not  exact  if  the  variances  are  unequal. 

Altogether  nine  methods  are  compared  in  our  simulation.  A  description  is  given  below 


symbol 

interval 

estimate 

Fie  Her 

Fieller's  interval. 

(10.1) 

VCJ( 1) 

VHJ(  1) 

VCJ8 

VHJ8 

Delete-1  jackknife 

Delete-1  jackknife 

Retain-8  jackknife 

Retaln-8  jackknife 

8  ±  t  /v  (8),  (7.2) 

a  J,n-1 

#  1  t  /v  IB),  (7.1) 

a  J,n-1 

6  i  t  /v,  (8),  (7.2) 

<1 

8  t  t  /v„  (8),  (7.1) 

Q  JfO 

VBOOT 

Bootstrap  variance 

»tt  /Z~,  (8.18) 

ct  d 

VLIH 

Linear  approximation 

<7-4) 

PBOOT 

Bootstrap  percentile 

C(SDFB-1(a),  CDFB-1(1-a)J, 

(8.20) 

PJ8 

Jackknife  percentile 
(retain-8) 

[CDFJ_1 (a),  CDFJ-1(1-a)), 

(7.5) 

(V  i  variance,  C  i  curl,  H 

:  hat,  P  >  percentile) 

The  average  coverage  probabilities  (based  on  3000  samples)  for  these  nine  methods  are 


given  in  Table  2  for  five  sets  of  parametsrs.  Since  Fieller's  interval  in  the  case  of  (I) 
and  (II)  of  (10.1)  has  infinite  length,  we  break  the  3000  sismlation  samples  into 
categories  (I),  (II)  and  (III)  according  to  which  category  the  corresponding  Fieller's 


Intervals  belong  to.  In  our  aimulation  aanplaa  (I)  navar  happen*;  (II)  happans  only  whan 


B2  -  -0.2S  and  -0.35.  In  thasa  two  caaea,  tha  aadian  langth  of  aach  lntarval  aatlnata  la 
computed  aaparataly  for  category  (II)  and  catagory  (III)  and  la  given  In  Table  3.  For  the 
reat,  tha  aiadl an  langth  over  3000  sample*  la  glvan  lnaide  tha  parentheaia  In  Table  2.  We 
do  not  report  tha  average  lengtha  alnce  they  are  too  auch  influenced  by  a  few  extreme 
valuea.  TaXa  82  -  -0.25  and  unequal  varlancaa  an  an  example.  The  average  length*  for 
VCJ8,  VHJ8  and  VBOOT  In  category  III  are  176.85,  365.76  and  39.54  reapectively  while 
the  Median*  are  10.65,  6.64  and  3.73.  The  thraa  Method*  perform  unatably  in  highly 
nonlinear  aituationa. 


Table  2.  Average  coverage  probabilltlea  and  median  lengtha  for  nine 
interval  eatimation  method*  (3000  aimulation  aamplea) 


method 

Nominal  level  -  0.95,  BQ  -  0, 
Unequal  variance* 

*2 

-0.25  -0.35  -0.5 

8,-4 

-1.0 

Equal  variance* 

*2 

-0.25  -1.0 

Fiellar 

.858 

.866 

.968 

.952 

.947 

.950 

(.98) 

(.92) 

(2.48) 

(.64) 

VCJ(1) 

.887 

.848 

.961 

.950 

.904 

.935 

(.91) 

(.89) 

(2.03) 

(.62) 

VHJ( 1) 

.866 

.845 

.950 

.947 

.899 

.935 

(.87) 

(.87) 

(1.94) 

(.62) 

VCJ8 

.946 

.920 

.968 

.953 

.947 

.939 

(.97) 

(.90) 

(3.19) 

(.63) 

VHJ8 

.931 

.908 

.96S 

.953 

.941 

.939 

(.93) 

(.90) 

(2.69) 

(.63) 

VBOOT 

.866 

.902 

.973 

.955 

.956 

.946 

(.97) 

(.91) 

(2.42) 

(.64) 

VLIB 

.865 

.891 

.969 

.952 

.949 

.948 

(.93) 

(.90) 

(2.18) 

(.64) 

PBOOT 

.829 

.814 

.940 

.921 

.912 

.916 

(.84) 

(.79) 

(2.05) 

(.56) 

PJ8 

.809 

.755 

.909 

.912 

.831 

.900 

(.78) 

(.78) 

(1.90) 

(.55) 

(length  of  interval  eatimata  in* id*  the  parentheaia) 
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Table  3.  Median  length*  of  nln*  interval  estimates 
of  category  (XX)  and  category  (III) 

6  «  0,  8  -  4.  unequal  variancea 


method 

B2- 

11(199)* 

-0.25 

XXX (2801) 

®2  * 
II<  7) 

-0.35 

1X1(2993) 

Fieller 

m 

3.81 

m 

1.10 

VCJ(1) 

29.08 

3.87 

8.92 

1.04 

VHJ( 1 ) 

15. 17 

3.13 

5.63 

0.98 

VCJ8 

223.67 

10.65 

38.08 

1.59 

VHJ8 

166.81 

6.64 

49.80 

1.37 

VBOOT 

313.17 

3.73 

86.63 

1.07 

VLIN 

14.75 

2.91 

5.82 

1.02 

PBOOT 

55.05 

3.07 

17.78 

0.93 

PJ8 

28.54 

3.34 

8.22 

0.92 

*The  figure  inside  the  parenthesis  is  the  number  of 
simulation  samples  belonging  to  the  category 


The  results  can  be  auaaaarised  as  follows  > 

1.  Kffect  of  parameter  nonlinearity.  When  the  parameter  8  becomes  more  nonlinear  C6 2 
closer  to  0),  all  the  intervals  become  wider  and  the  associated  converage 
probabilities  smaller.  The  phenomenon  is  especially  pronounced  for  unequal  variances 
and  »  -0.25,  -0.35,  where  we  observe  the  Fieller  paradox  (i.e.,  Fieller's  intervals 
take  the  form  (10.1)  (IX).)  In  these  two  cases,  only  the  two  retain-8  jackknife  methods 
provide  Intervals  with  good  coverage  probabilities.  But  the  price  is  dear.  Both  the 
mean  and  median  lengths  of  their  intervals  are  quite  big  even  in  category  (XIX)  where 
Fieller's  interval  is  reasonably  tight  but,  of  course,  with  poor  coverage  probability. 

In  the  other  cases,  the  first  seven  method*  all  do  reasonably  well. 

2.  Kffect  of  error  variance  heterogeneity.  As  the  theory  indicates,  the  general  perfor¬ 
mance  is  less  desirable  in  the  unequal  variance  case.  Fieller’s  interval  is  far  from 
being  exact  for  62  *  -0.25,  -0.35  and  unequal  variances.  For  equal  variances 
Fieller's  method  is  almost  exact  and  the  next  six  methods  (t-intervals  with  various 
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variance  estimates)  perform  raaaonably  wall  avan  in  tha  Boat  nonlinaar  casa  P ^  »  -0.25. 
the  two  ratal n-8  jaekknlfa  methods  ara  laaat  affactad  by  tha  hataroganalty  of  variancaa. 

3.  Qidercoveraqe  of  tha  parcantlla  aathoda .  Ibis  la  vary  disappointing  in  viaw  of  tha 
aacond  ordar  aaywptotic  raaulta  on  tha  bootatrap  (Singh,  1981 1  Baran,  1982)  which  ara 
uaad  aa  avidanca  of  tha  auperiority  of  tha  bootatrap  approximation  ovar  tha  claaaical 

t  approxiaation.  Tha  undarcovaraga  of  tha  bootatrap  parcantlla  and  the  jackknife  per¬ 
centile  aethoda,  with  the  latter  being  tha  aore  aerioua  one,  ia  partly  due  to  the  fact 
that  their  aaaociatad  intervale  are  ahorter.  But  noting  froa  Tablea  2  and  3,  the 
linearication  variance  aethod  (VLIN)  has  coaparable  interval  length  and  yet  higher 
coverage  probabilitiea.  We  think  the  problem  ie  a  aore  intrinaic  one.  We  apeculate 
that  this  ahortcoaing  has  aoaething  to  do  with  the  akewneaa  and  light-tailedneaa  of  the 
bootatrap  and  jackknife  hiatograaa. 

4.  rialler’a  method  la  exact  in  the  equal  variance  caae  even  when  the  parameter  ia  con¬ 
siderably  nonlinear,  but  ia  quite  vulnerable  to  error  variance  heterogeneity. 

5.  the  linearisation  method  la  a  winner,  thia  la  aost  surprising  since  we  cannot  find  a 
theoretical  justification,  the  intervals  are  consistently  among  the  shortest,  and  the 
coverage  probabilities  are  quite  coaparable  to  the  others  (except  for  $2  »  -0.25,  -0.35 
and  unequal  variances  where  VCJ8  and  VHJ8  are  the  best),  the  linearisation  aethod  ia 
compared  favorably  with  Heller's  aethod.  the  former  has  consistently  shorter  Intervals 
than  the  latter  and  the  coverage  probabilities  are  very  close.  In  fact  for 

$2  m  -0.25,  -0.35  and  unequal  variances,  VLIN  has  much  shorter  intervals  and  much  higher 
coverage  probabilities.  Note  that  Fleller's  intervals  are  unbounded  in  199  (8^  » 

-0.25)  and  7  (8  ■  -0.35)  out  of  3000  samples  (Table  3). 

2 

6.  Internal  (curl)  or  external  (hat)  adjustment  in  jackknife  variance  estimation?  In 
general  the  curl  jackknife  gives  wider  intervals  than  the  hat  jackknife.  On  the  other 
hand  the  coverage  probabilitiea  of  the  two  methods  are  very  coaparable.  Further 
research  is  needed  to  sort  out  the  relative  merits  of  the  two  adjustment  methods  in  more 


general  situations 


11.  Concluding  remarks  and  further  questions 

The  sain  idaaa  and  reaulta  of  thia  paper  can  be  summarised  aa  follows > 

1.  The  general  representation  of  the  full-data  least  squares  estimate  as  a  weighted  average 
of  the  resaaple-data  least  squares  estimates  for  general  resampling  plans.  He  expect  to 
see  further  applications  of  this  representation. 

2.  The  proper  weight  for  each  subset  least  squares  estimate  is  proportional  to  the 
determinant  of  the  XTX  matrix  of  the  subset.  Since  the  latter  matrix  is  proportional 
to  the  Fisher  information  matrix  of  the  subset ,  it  immediately  suggests  an  extension  of 
our  general  jackknife  procedure  to  nonlinear  regression  models  and  generalised  linear 
models  (McCullagh  and  Nelder,  1983).  For  each  subset,  the  corresponding  nonlinear  least 
squares  estimate  or  maximum  likelihood  estimate  is  computed  and  the  Fisher  information 
matrix  of  the  subset  is  evaluated  at  the  estimated  parameter  value.  The  formulae 
developed  in  the  paper  can  be  applied  in  a  straightforward  manner. 

3.  The  delete-one  jackknife  variance  estimator  is  robust  against  error  heterogeneity.  None 
of  the  bootstrap  methods  under  consideration  is  robust.  Bootstrapping  the  residuals  is 
too  model-dependent  to  be  a  robust  tool. 

4.  The  scope  of  the  jackknife  method  is  broadened  with  the  Introduction  of  the  (weighted) 
jackknife  histogram  and  the  interval  estimation  method  based  on  its  percentiles.  It  is 
made  possible  by  the  more  flexible  choice  of  subset  sise  and  the  weighting  factor  dis¬ 
cussed  in  2.  Although  the  two  percentile  methods  do  not  perform  well  in  the  simulation, 
an  effective  modification  of  the  jackknife  percentile  method  will  probably  have  to 
Incorporate  the  above  two  elements. 

5.  The  problem  of  bias  reduction  is  intimately  related  to  unbiased  estimation  of 
variance.  This  is  especially  interesting  when  the  latter  is  not  easy  to  achieve,  e.g. 
in  the  heteroscedastic  situation. 

Several  questions  have  been  raised  in  the  course  of  our  study.  He  hope  they  will 

generate  further  interests  and  research  in  this  area. 
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I.  He  conjecture  that  Theorem  S  is  still  true  for  v,  with  small  d  -  n-r,  that  is, 

«j,r 

the  delete-d  jackknife  variance  estimator  is  robust  against  error  heterogeneity. 

Does  vj,r  enjoy  other  desirable  properties?  For  example,  is  vj(n-2  (delete-two 
jackknife)  robust  against  certain  forms  of  error  correlation? 

II.  Does  there  exist  a  bootstrap  variance  estimator  that  is  robust  against  error  hetero¬ 
geneity?  For  the  bootstrap  method  to  be  model-free  or  model-robust  as  is  sosietimes 
claimed  (Efron  and  Gong,  1963),  this  is  a  very  basic  requirement. 

III.  The  methods  based  on  the  bootstrap-histogram  and  the  jackknif e-histogram  perform 

disappointingly  in  the  simulation.  Refinements  of  these  methods  are  called  for.  One 
obvious  defect  of  the  resample  histograms  is  that  they  have  shorter  tails  than  their 
population  counterparts.  The  handling  of  skewness  may  also  be  improper .  The  poor 
performance  of  the  percentile  methods  raises  our  doubt  about  the  relevance  of  the 
present  asymptotic  results  on  the  bootstrap.  Mathematical  results  that  can  explain 
the  empirical  behavior  are  urgently  needed. 

IV.  Is  it  possible  to  find  a  theoretical  guide  on  the  choice  of  subset  size  for  the  jack¬ 
knife  method?  One  interesting  possibility  may  start  with  the  concept  of  'distance 
matching"  given  in  Section  7. 

V.  The  scale  factor  (r-k+1)/(n-r )  in  the  retain-r  jackknife  method  is  used  for 

"distance  matching".  It  can  be  applied  either  before  or  after  the  nonlinear  trans¬ 
formation  (see  (7.1)  and  (7.2)).  It  would  be  interesting  to  sort  out  the  relative 
merits  of  these  two  scaling  methods. 
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